From cbullard at hiwaay.net Sun Feb 1 00:34:02 1998 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 17:00:03 2004 Subject: XSL/XML/XLL and VRML (was: Re: Conditional actions in XSL?) References: <4955E202FE46D11195C500609712EB6B05C193@FLPS-NTSERVER1> Message-ID: <34D3C277.5002@hiwaay.net> Tony Stewart wrote: > > Len Bullard wrote: > > >>It can do what DTDs do well: provide a precise description of the > presentation style of the interface as a set of routed behaviors. > > I would have thought that a good DTD doesn't do this at all. The DTD > should define the information content, leaving both style and (IMO) > behavior to be specified in a stylesheet that is tailored to this > specific usage of the information. > Thus, it is the style sheet describes > the presentation style, not the DTD. Otherwise, how are you going to > reuse the information in other formats? You're not going to want to > change the DTD. And you may not have permission to do so in any case. > > Since this is all pretty basic religious thinking, perhaps I > misunderstood you. One could say that it is a religious conviction in some cases and be quite right, and in others, it is an engineering constraint and be right. It is the *SGML Way*. In that sense, yes, it is a religion, and for some years, I practiced it. "But what is the good, Phaedrus?" Look at what you are saying: 1. Stylesheet properties are not "information" 2. Stylesheets express behaviors. So in fact, a stylesheet language is a programming language, Turing complete if you will. 3. For some kinds and instances of information, there are lifecycle requirements for reuse. 4. For some kinds and instances of information (DTDs in your example), there are policies for the behaviors that can be applied to the kinds and instances of information. 1. I don't think you intend one. But it is often a hidden premise in the debates about separating style from content (which is what you are using information). That distinction proves to be thin. Perhaps by stylesheet information, you mean, typographic properties. 2. Stylesheets that express behaviors are simply programming languages with structures (data types) for typographic properties. In this view, Java/AWT et al is a stylesheet language. After that, choosing one comes down to practical engineering requirements of platforms, libraries, interoperation with other engines, etc. Anyway, in this view, VRML is a stylesheet language. Perhaps the best way for it to include text support is to include it natively. This idea has come up and there is a text node in VRML which browsers like WorldView can display very well. (NOTE: The issue of reformulating VRML as XML is one of the framework efficiency, not descriptive power or lifecycle.) 3. This is true of course. But unless requirements are very carefully examined, no size fits all. 4. True and it varies widely. One of the features of DTDs that make them very attractive for policy is the ease with which they can be adjusted liberally on site of use. This one slips by most of the SGML theorists who do not work in production sites where multiple versions of DTDs are used at different points of a process or procedure. In other words, they are an instrument of policy, not a policy. Information is not static where a high rate of change prevails. A DTD is more like a control knot in a NURB than a point in a B-spline. My point is that for many information engineering problems, the approach Pierre took with Prototype has been taken by others and successfully. The arbiter of success is not the religion of the SGML Way, but the ability to meet the requirements of the task. Bytes aren't holy. As XSL/XML/XLL reach ever greater levels of design complexity in the base standards, a question emerging in other design groups (one heard before during the HyTime/DSSSL era) is: Are these really complicated solutions looking for problems, not new and vital technologies? Is there sudden rush of popularity based on the soundness of applicability, or the product of software company juggling of public perceptions? If simpler and more readily available and more easily understood technologies exist to solve a problem with an acceptable timeframe exist, the experienced engineer and the practical customer adopt them. If not, they try the next best thing. Is XML a *religion* of just the next best thing? Len Bullard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Sun Feb 1 00:47:54 1998 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 17:00:03 2004 Subject: SGML Architecture questions Message-ID: <2.2.32.19980201004632.00719004@dream.paragraph.com> I may be wrong, but from my understanding of SGML architecture, only bridging mechanism provides for type extension. Everything else in architecture seems to be element and attribute names remaping. Bridging element serves as a target for mapping substructure to it. Still bridging element is not defined in DTD and as a result its content/attributes can't be validated by parser. Is that correct ? Taking bridging example from "A Tutorial Introduction to SGML Architectures" by W. Eliot Kimber, with architectural DTD : And mapping from elements in the document to elements in the architecture : ]> KimberWilliam 1234 Maple St. AustinTX78757 There is no DTD for element content so: KimberWilliam could be : KimberEliotWilliam So my question is : how validity constraints can be enforced for bridging element substructure ? Thanks, Dima ----------------- Dmitri Kondratiev dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cbullard at hiwaay.net Sun Feb 1 01:01:24 1998 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 17:00:03 2004 Subject: First experiences with XSL References: <2.2.32.19980130155416.0085e27c@pop> Message-ID: <34D3C820.2671@hiwaay.net> Sharon Adler wrote: > > Michael, > > As I write this, the XSL WG is 2/3 through its first official meeting. The > Microsoft code does not represent the "Final" XSL but the srawman of some of > the facilities of XSL. The lack of diagnostics/limited functionality of a > partial prototype implementation is not any indication of the functionality > or capability of a style language, nor any final implementation. Of course > you can accomplish what you wanted in Java. Any hacker can do anything they > want in code, but what about the rest of the world's humans. Can anyone show that XSL (if indeed, a Turing complete language) is any easier than Java? XSL is a programmig language and there are far more mortals (programmers in some cases) who understand and can easily use Java than XSL/DSSSL. Why? Object-oriented programming is the rule not the exception in programming communities. JavaScript has a tremendous advantage in that stepping up to Java from JavaScript incurs no shocks of syntax. It is an easy transition. Since at least C forward, it has been the support libraries that made the difference in ease or utility because syntax aside, and side effect issues, the same features are found in most programming languages. So, one might retreat to the defense of "But it is a standard" and there one would have a point. Unless and until Sun releases Java as a true standard (a PAS won't cut it), implementors of systems based on it create systems based on proprietary technology. > Please don't use the XSL prototype if it is not suitable for you to play > around with, but give us a chance to create a workable standard. But of course. len bullard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jamsden at us.ibm.com Sun Feb 1 01:07:30 1998 From: jamsden at us.ibm.com (Jim Amsden) Date: Mon Jun 7 17:00:03 2004 Subject: XSL/XML/XLL and VRML (was: Re: Conditional actions in XS Message-ID: <5040100014394115000002L052*@MHS> Tony Stewart wrote: >>I would have thought that a good DTD doesn't do this at all. The DTD >>should define the information content, leaving both style and (IMO) >>behavior to be specified in a stylesheet that is tailored to this >>specific usage of the information. More religion: Information content should be subordinate to behavior, not the other way around. The DTD defines the information structure required to support (unfortunately) implied behavior which establishes the meaning of that data in the context in which it was defined. Attributes establish characteristics which maintain state supporting variant behavior. Contents and links represent associations supporting additional state, and enabling collaborations with other elements required to support behavior, including behavior of the document as a whole. Of course, none of this has anything to do with rendering unless that's the subject of the DTD. Note that if a language is rich enough, it doesn't have to change just because the subject area changes. This might be the basis of the appeal of XSL and XML-Data which both use XML (more or less) to describe their subject areas. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dcarlson at ontogenics.com Sun Feb 1 02:16:05 1998 From: dcarlson at ontogenics.com (Dave Carlson) Date: Mon Jun 7 17:00:03 2004 Subject: problems with emacs xml-mode Message-ID: <2.2.32.19980201021007.00e40c30@pop.dimensional.com> At 05:57 PM 1/31/98 -0500, David Megginson wrote: > > 2. The DTD is parsed, but all element names are folded into all lower case. > > Does the current version of xml-mode support mixed-case element names? If > > so, what am I doing wrong? > >Are you certain that you're using the latest version of the patches >(from Fall 1997) and that you're actually in XML rather than SGML >mode? Does it read 'XML' or 'SGML' in the mode bar at the bottom? I'm using the xml-mode that I downloaded from your site in December 1997. And, yes, it does read 'XML' in the mode bar. I'll try some additional testing to see if I can narrow down the problem. Is there some other test I can run to be sure I've got the entire xml- mode installed properly? I had to do some manual hacking to install on WinNT, maybe I messed up somewhere. I've never gotten it to work correctly, but sometimes I get the top-level element names in mixed case, and the content model all folded to lower case. So, I can add mixed case elements at the top level, but there are no "valid" sub-elements because the content model has all tags in lower case. In another test, everything was lower case. > > 4. Font highlighting has some problems. I've configuring my _emacs file > > according to earlier posts in this list, but the text highlighing only > > appears after I've used the context menu to insert a new tag. Then, the > > text is only highlighted from that point *backward* in the document. When I > > first load a document, no text is highlighted. > >Again, this is not directly related to the XML patches. PSGML will >highlight only the parts of the document that it has already parsed. >In Unix, at least, it will eventually parse ahead and highlight the >whole thing. > Yes, it will eventually highlight the entire document, once I've made an addition to the end of the document. Thanks for you help, and your contribution! Dave xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Sun Feb 1 13:57:12 1998 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 17:00:03 2004 Subject: SGML Architecture questions Message-ID: <3.0.32.19980201074848.00c84c30@swbell.net> At 03:46 AM 2/1/98 +0300, Dmitri Kondratiev wrote: >I may be wrong, but from my understanding of SGML architecture, only >bridging mechanism provides for type extension. Everything else in >architecture seems to be element and attribute names remaping. Bridging >element serves as a target for mapping substructure to it. Still bridging >element is not defined in DTD and as a result its content/attributes can't >be validated by parser. Is that correct ? The bridging element *is* defined in the DTD, so it's use can be validated by the parser, but your real question is: >how validity constraints can be enforced for bridging element substructure ? You do it locally in the document's own DTD, or you do it by deriving the bridging element from another architecture. >There is no DTD for element content so: Yes there is: (#PCDATA | archbridge)* However, you're point is that you might want to impose constraints on the local (to this document) content of elements that map to archbridge. You could define, locally, the content for the name element to match your constraints: ]> KimberWilliam 1234 Maple St. AustinTX78757 You can also do it by deriving the bridging element from another architecture: (This modifies the above declarations:) This says that the cust.name element plays the role "name" within the personarch architecture and the role "person-name" within the namearch architecture. I can validate that cust.name satisfies the rules for "name" as defined by the personarch and that its content satisfies the rules for "person-name" in the namearch. Notice how the cust.name element "bridges" from the personarch architecture to the namearch architecture or from the architecture to the local (document-specific rules). Cheers, Eliot --
W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cbullard at hiwaay.net Sun Feb 1 17:16:07 1998 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 17:00:03 2004 Subject: First experiences with XSL References: Message-ID: <34D4AD72.49CE@hiwaay.net> Betty Harvey wrote: > > On Sat, 31 Jan 1998, len bullard wrote: > > > > > Can anyone show that XSL (if indeed, a Turing complete language) is any > > easier > > than Java? XSL is a programmig language and there are far more mortals > > (programmers in some cases) who understand and can easily use Java than > > XSL/DSSSL. Why? Object-oriented programming is the rule > > not the exception in programming communities. JavaScript has a > > tremendous > > advantage in that stepping up to Java from JavaScript incurs no > > shocks of syntax. It is an easy transition. > > > > Len: > > My experience is it is XSL is easier. I was able to > take the XSL tutorial and create a simple example of an > XSL stylesheet. > > If you have Microsoft Explorer 4.0 or higher you can test my first > example at: http://www.eccnet.com/xmledi. > > My initial thoughts are that it doesn't do everything I > want it to do - but I am going to hold judgement until the XSL > standard becomes more stable. Initially - I am impressed and > looking forward to what XSL will offer us - thank goodness > someone is not only thinking about style and behavior but > moving towards a standard implementation effort - what > FOSI tried to do 8 years ago. > > Betty That is good to hear. Yet, the XSL/XLL discussion to me has the feel of attending a summer stock presentation of Hamlet: famous lines all carefully memorized, spoken thousands of times before, and Hamlet still dies in the last scene. Don't take it as a "I don't like XSL" but a cautionary, "we know our parts so well we can sleepwalk through them." So yes, compelling examples are needed. The FOSI perished in complexity, HyTime has almost met the same fate, and DSSSL never got out of the gate before events and technology have overtook it. We have to meet the criticism that XML technology is a solution looking for a problem. We need something better than the same defenses we presented for SGML/HyTime/DSSSL to the same criticism. I sense a deflation in the enamouring of the Web. Joe Q Public has discovered the anemia of the infrastructure. Still, experimental team efforts such as VRMLDream which will demonstrate a puppeteering technology for virtual theatre has promise. For these applications, it is 1945 and each TV network is a world unto itself. These groups see the Internet as a broadcasting medium. Maybe Clinton will survive his current problems and deliver on that "1000X the bandwidth" promise. There is little doubt that replacing the Internet infrastructure is needed ASAP. Business interest is stable, yet the groups who control the corporate standards are from printing backgrounds and marketing. They see the Internet as a publishing medium. They tend to be underwhelmingly technically talented, aversive to technology whose practicioners they do not control, and able to restrict the application at the heart of the matter: funding. While the true practicioner seeks to expand capability, the purse stringers seek to restrict it and successfully. It is necessary to look at the whole of the framework and how that can best meet business needs, in content developement, maintenance, production, and distribution. The architectures must be sold accordingly. (one rung up the CALS spiral). Beware jargon; beware complex examples, beware precise description that fails to engender imaginative application. The hook is the imagination. Sink the hook to reel in the fish. Overall efficiency is becoming the primary issue given the size and bugginess of the framework. Building evermore compelling and sustainable content is still the goal. Just remember that many many groups do not believe that putting long lifecycle information assets on the WWW is a good thing to do. Find out why. best, len xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Feb 1 17:18:12 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:03 2004 Subject: JUMBO9801a1 release Message-ID: <3.0.1.16.19980201170326.1a4f992a@pop3.demon.co.uk> An updated version of the alpha JUMBO distribution (hopefully with the earlier bugs removed) is available at: http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/jan9801/jumbo9801a1.zip This should supersede the earlier version. The JUMBO in this distribution now runs as an APPLET as well as the application described previously and you are welcome to experiment. Since applets require classes to be 'under' the codebase, I have not tested the SAX-compliant parsers; experiments and feedback is welcome. Note that some of the text fields are no longer included in the distribution and should be downloaded from the appropriate sites As before I welcome gross errors (e.g. it doesn't run). P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Feb 1 20:43:19 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests Message-ID: <199802012028.PAA00747@unready.microstar.com> As promised, I will now begin to summarise the requested changes to SAX before we put out a stable 1.0 version: over the next few days, I will send out one message summarising the requested changes to each interface or class. For more information on SAX, see http://www.microstar.com/XML/SAX/ There have been only two changes proposed to the Parser interface, both of which would be backwards-compatible with existing implementations: 1) Allow SAX to work with an input stream as well as a URI. 2) Simplify handler chaining by adding get* methods for existing handlers. Here are the change requests in detail, with my initial response at the end of each one: 1) Allow SAX to work with an input stream as well as a URI. - Paul Pazandak - Peter Murray-Rust - Don Park Currently, the Parser interface provides only the following method to initiate a parse: void parse (String publicId, String systemId) throws java.lang.Exception; Following this suggestion, there would be a new method void parse (String publicId, String systemId, InputStream input) throws java.lang.Exception; (It is still necessary to provide a system identifier for resolving relative URIs within the stream). Note that the stream would be a byte stream, not a character stream -- characters might require more than one octet, depending on the encoding in use. I can see the convenience of this method, and I plan to add something like this to AElfred when I have a chance. For SAX, however -- which is meant to end up as a language- and system-independent API -- I am reluctant to hardcode assumptions about storage (and I don't know enough about IDL to know if there is a general representation for streams). Paul Pazandak has also suggested allowing strings and buffers -- in this case, they would already be decoded into characters. Personally, I'm undecided, and would be interested in hearing the theoretical arguments for and against this suggestion. 2) Simplify handler chaining by adding get* methods for existing handlers. - Don Park Currently the Parser interface provides only setters for the various handlers: public void setEntityHandler (EntityHandler handler); public void setDocumentHandler (DocumentHandler handler); public void setErrorHandler (ErrorHandler handler); Following this suggestions, there would also be accessors: public EntityHandler getEntityHandler (); public DocumentHandler getDocumentHandler (); public ErrorHandler getErrorHandler (); An application could then retrieve the existing handler and implement a new one which invokes the old one under certain circumstances. This seems like a generally good idea (as will as a simple and backwards-compatible change), and I am willing to implement it. The only complication is that we'll have to define the default state -- is the parser always required to return a default handler if the user has not explicitly set one, or should it return null? I look forward to your comments and suggestions. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Sun Feb 1 21:42:32 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests References: <199802012028.PAA00747@unready.microstar.com> Message-ID: <34D4EE00.A1FCECF5@infinet.com> > Here are the change requests in detail, with my initial response at > the end of each one: > > 1) Allow SAX to work with an input stream as well as a URI. > > - Paul Pazandak > - Peter Murray-Rust > - Don Park > > Currently, the Parser interface provides only the following method > to initiate a parse: > > void parse (String publicId, String systemId) > throws java.lang.Exception; > > Following this suggestion, there would be a new method > > void parse (String publicId, String systemId, InputStream input) > throws java.lang.Exception; > > (It is still necessary to provide a system identifier for resolving > relative URIs within the stream). Note that the stream would be a > byte stream, not a character stream -- characters might require > more than one octet, depending on the encoding in use. Well, what if the XML data is streamed from a database where a URL does not matter so much. If you look at what Oracle, Sybase, and Microsoft among others are planning on doing with XML, then supporting this with SAX in the most ubiquitous way will be very much necessary. I think that if you want to make SAX have any CORBA support or other language support down the line, it would be best to negate any polymorphism in the API cause in CORBA for example, you cannot redefine operations in IDL (methods in Java). > I can see the convenience of this method, and I plan to add > something like this to AElfred when I have a chance. For SAX, > however -- which is meant to end up as a language- and > system-independent API -- I am reluctant to hardcode assumptions > about storage (and I don't know enough about IDL to know if there > is a general representation for streams). Paul Pazandak has also > suggested allowing strings and buffers -- in this case, they would > already be decoded into characters. Another idea (as far as implementation goes) is to have the parser simply be an extension of java.io.FilterInputStream which takes an one or more Handler interfaces as arguments (to delegate to), so that you can handle very large streams of data. In addition to overriding the necessary java.io.FilterInputStream methods, you can also have methods like readDocument(), readElement(), etc. This would give people a lot more control over reading in XML. This approach of course is similiar to how URL Content in the java.net package handles content. But where I see this approach being most useful is in transactions where you might only want to read in a limited amount of data anyways and process only that or else in the case where XML content is always at a fixed length (like in databases where you get null padding for string fields which do not take up the assigned length). With the current SAX implementation, you have no real control at the IO level where it would help to skip content if the application feels it is necessary. > Personally, I'm undecided, and would be interested in hearing the > theoretical arguments for and against this suggestion. > > 2) Simplify handler chaining by adding get* methods for existing > handlers. > > - Don Park > > Currently the Parser interface provides only setters for the > various handlers: > > public void setEntityHandler (EntityHandler handler); > public void setDocumentHandler (DocumentHandler handler); > public void setErrorHandler (ErrorHandler handler); > > Following this suggestions, there would also be accessors: > > public EntityHandler getEntityHandler (); > public DocumentHandler getDocumentHandler (); > public ErrorHandler getErrorHandler (); > > An application could then retrieve the existing handler and > implement a new one which invokes the old one under certain > circumstances. Not sure exactly what the use of these get methods is for cause all the handlers are useful is delegation anyways. The only reason the get methods would be useful is for casting the returned object to some other form. Why anyone would need to do this is beyond me as recasting this object back to something would be sloppy implementation in the first place. > This seems like a generally good idea (as will as a simple and > backwards-compatible change), and I am willing to implement it. > The only complication is that we'll have to define the default > state -- is the parser always required to return a default handler > if the user has not explicitly set one, or should it return null? The default handler could just be something which spits stuff out to stdout or some other OutputStream in a manner similiar to how Aelfred's EventDemo does. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Sun Feb 1 22:36:19 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests References: <199802012028.PAA00747@unready.microstar.com> <34D4EE00.A1FCECF5@infinet.com> Message-ID: <34D4FA9E.DFB80BAA@infinet.com> Tyler Baker wrote: > > I can see the convenience of this method, and I plan to add > > something like this to AElfred when I have a chance. For SAX, > > however -- which is meant to end up as a language- and > > system-independent API -- I am reluctant to hardcode assumptions > > about storage (and I don't know enough about IDL to know if there > > is a general representation for streams). Paul Pazandak has also > > suggested allowing strings and buffers -- in this case, they would > > already be decoded into characters. > > Another idea (as far as implementation goes) is to have the parser simply be an > extension of java.io.FilterInputStream which takes an one or more Handler > interfaces as arguments (to delegate to), so that you can handle very large > streams of data. In addition to overriding the necessary > java.io.FilterInputStream methods, you can also have methods like readDocument(), > readElement(), etc. This would give people a lot more control over reading in > XML. This approach of course is similiar to how URL Content in the java.net > package handles content. But where I see this approach being most useful is in > transactions where you might only want to read in a limited amount of data > anyways and process only that or else in the case where XML content is always at > a fixed length (like in databases where you get null padding for string fields > which do not take up the assigned length). With the current SAX implementation, > you have no real control at the IO level where it would help to skip content if > the application feels it is necessary. One last thing I wanted to add to this which would be nice is if you had the Parser be an extension of java.io.FilterInputStream or java.io.InputStream, would be for being able to simple take a compressed XML file and unpack it all in one line of code. For example, you could create it all like this: XMLInputStream xis = new XMLInputStream(new CompressedInputStream(in), handler); where in, is any input stream (like file, URL, etc) and handler is one or more handlers. This I feel is much more flexible, since currently SAX only will accept content which comes from a resolved URL as well as the fact that if you are going to have an InputStream argument, you will need control over how it is handled. In addition, you might want to be able to register the handler right before actually handling the content. For example, if you get a systemID or publicID of some type (this would currently occur with a doctype event in SAX), you would then want to register a particular document handler with that type (which could be done nicely with a dynamic class loading mechanism). In this case, you might have a static method in the XMLInputStream class which acts as a registry for handlers of various document types that could be something no more complex than a hashtable of class names which are indexed by systemID or publicID. You could have this registry just be for documents, or else it could even be more complex with a federated namespace of handlers for elements. Personally I would much rather write code that looks like this: // Done when I initialize the program java.util.Properties handlers = new java.util.Properties(); try { handlers.load(new FileInputStream("foo.txt")); } catch (IOException) { e.printStackTrace(); } XMLInputStream.registerHandlers(handlers); // Then later do this URL fooURL = new URL("http://www.foo.com/bar.xml"); XMLInputStream xis = new XMLInputStream(fooURL.openStream()); Or if you don't use any registry for document handlers, you could simply do something like this DocumentHandler bdh = new BarDocumentHandler(); // Assumes bar.xml is a document type "bdh" can handle URL fooURL = new URL("http://www.foo.com/bar.xml"); XMLInputStream xis = new XMLInputStream(fooURL.openStream(), bdh); Once you have the "xis" reference, then just call methods like "readDocument(Document document)" which would read the document data into a Document object (Document would be an interface). Document document = new MSWord90Document(); try { xis.readDocument(document) } catch (IOException e) { e.printStackTrace(); } Personally I prefer the registry idea so you the application would know ahead of time what to do for any XML file (handle it or else do some default handling). Just some ideas before v1.0 of SAX in grinded in stone... Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Sun Feb 1 22:43:42 1998 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 17:00:03 2004 Subject: First experiences with XSL References: <2.2.32.19980130155416.0085e27c@pop> <34D3C820.2671@hiwaay.net> Message-ID: <34D4FA79.4BF7FA1F@allette.com.au> len bullard wrote: > Can anyone show that XSL (if indeed, a Turing complete language) is any easier > than Java? XSL is a programmig language and there are far more mortals > (programmers in some cases) who understand and can easily use Java than > XSL/DSSSL. I live in hope of the day when I finally see a file come out of a word processor as XML, preceded by a DTD and an XSL style sheet. Rather than just regard XSL as programming language, I would like to see it used as a common application formatting syntax, as was tried with RTF. Assuming the users are going to do pretty much whatever they want to as far as tagging is concerned (either for legacy data or ongoing), conversion from one DTD to another will always be far easier than conversion from an unstructured document to a structured one. This is particularly true when you consider in current conversions how much structure is implied from formatting characteristics (although this would presumably be substantially diminished with more structured documents). From the perspective of conversion of data (perhaps from a somewhat sloppy creation model to a more concise storage model), a parseable, reasonably regular stylesheet would seem to have advantages over Java. Also, it may ultimately be desirable to produce an XSL document from some source, interface or language that suits your individual needs better, thus XSL again behaves as an interchange format. I think this fits well with the spirit of XML/SGML. > So, one might retreat to the defense of "But it is a standard" and there one > would have a point. There are other reasons, but the one you give above is also difficult to go past :-) -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Feb 1 22:55:04 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:03 2004 Subject: Parser Interface -- Summary of Change Requests Message-ID: <003b01bd2f63$c5b6c800$2ee044c6@donpark> David, >1) Allow SAX to work with an input stream as well as a URI. ... > void parse (String publicId, String systemId, InputStream input) > throws java.lang.Exception; ... My suggestion would be to add following two methods to the EntityHandler interface: public InputStream getEntityByteStream (String systemID) throws Exception; public InputStream getEntityCharStream (String systemID) throws Exception; The parser implementation should invoke getEntityCharStream first to see if the there is decoded data available. If not, it should invoke getEntityByteStream to get the raw data. If both methods return null, then default URL based code is used. >2) Simplify handler chaining by adding get* methods for existing > handlers. ... > This seems like a generally good idea (as will as a simple and > backwards-compatible change), and I am willing to implement it. > The only complication is that we'll have to define the default > state -- is the parser always required to return a default handler > if the user has not explicitly set one, or should it return null? It would be up to the SAX implementation. It might provide default implementation depending on configuration. For example, FooSaxDriver might have setInputType() method which would install a default EntityHandler for fetching XML document from a database. BTW, You left out my other suggestion which was >>>>>>>>>>>>>>>>>>>>>>>> In addition, I would like to have following two methods added to the Parser API for driver-specific operations: public Object getDriverProperty(String name); public Object setDriverProperty(String name, Object value); Property names should be prefixed with some unique values to avoid confusing other drivers. Note that above methods can be invoked without knowing which driver is actually being used. For example: parser.setDriverProperty("SuperDriver.lowercaseElements", Boolean.TRUE); parser.setDriverProperty("HungryDriver.cacheSize", new Integer(100000)); <<<<<<<<<<<<<<<<<<<<<<<< Above two methods allow driver-specific code without actually having to import anything. Regards, Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Feb 1 23:11:06 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests Message-ID: <006601bd2f66$08ce9d50$2ee044c6@donpark> >Not sure exactly what the use of these get methods is for cause all the handlers >are useful is delegation anyways. The only reason the get methods would be >useful is for casting the returned object to some other form. Why anyone would >need to do this is beyond me as recasting this object back to something would be >sloppy implementation in the first place. get methods are for chaining delegations possible as well as allowing the drivers to provide more functional default handlers without worrying about having them blasted out of the water just because the application wants to override the handler. It is beyond me as to why anyone would cast the returned object to some other form whether such practice is sloppy or not. Please enlighten me. >The default handler could just be something which spits stuff out to stdout or >some other OutputStream in a manner similiar to how Aelfred's EventDemo does. I don't think customers will appreciate having stdout or whatever filling screen or disk with SAX event messages. Internet Explorer with java logging enabled would cause a hiccup. Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Feb 2 20:55:07 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests In-Reply-To: <34D4EE00.A1FCECF5@infinet.com> References: <199802012028.PAA00747@unready.microstar.com> <34D4EE00.A1FCECF5@infinet.com> Message-ID: <199802022050.PAA01517@unready.microstar.com> Tyler Baker writes: [on reading XML from a stream rather than a URI] > Well, what if the XML data is streamed from a database where a URL > does not matter so much. If you look at what Oracle, Sybase, and > Microsoft among others are planning on doing with XML, then > supporting this with SAX in the most ubiquitous way will be very > much necessary. I think that if you want to make SAX have any > CORBA support or other language support down the line, it would be > best to negate any polymorphism in the API cause in CORBA for > example, you cannot redefine operations in IDL (methods in Java). This is a good point, but there are complications. Do these vendors plan to use character streams or byte streams? > Another idea (as far as implementation goes) is to have the parser > simply be an extension of java.io.FilterInputStream which takes an > one or more Handler interfaces as arguments (to delegate to), so > that you can handle very large streams of data. This sounds like an interesting idea for a parser implementation, but since SAX is meant to work with many parsers in many languages, it is probably too constraining as a general common interface. [on get* methods for handlers] > Not sure exactly what the use of these get methods is for cause all > the handlers are useful is delegation anyways. The only reason the > get methods would be useful is for casting the returned object to > some other form. Why anyone would need to do this is beyond me as > recasting this object back to something would be sloppy > implementation in the first place. Delegation itself might be enough justification, though -- we'll have to wait and see what others suggest. > The default handler could just be something which spits stuff out > to stdout or some other OutputStream in a manner similiar to how > Aelfred's EventDemo does. It would probably be best for the default handler to produce no output at all, so that other handlers delegating to it would not end up creating bloated log files. All the best, and thanks for the feedback, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Feb 2 21:04:04 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:03 2004 Subject: Parser Interface -- Summary of Change Requests In-Reply-To: <003b01bd2f63$c5b6c800$2ee044c6@donpark> References: <003b01bd2f63$c5b6c800$2ee044c6@donpark> Message-ID: <199802022059.PAA01592@unready.microstar.com> Don Park writes: > public InputStream > getEntityByteStream (String systemID) > throws Exception; > > public InputStream > getEntityCharStream (String systemID) > throws Exception; > > The parser implementation should invoke getEntityCharStream first to see if > the there is decoded data available. If not, it should invoke > getEntityByteStream to get the raw data. > > If both methods return null, then default URL based code is used. I like the general idea, though there are implementation problems. Many languages (including Java 1.0.2) have no concept of a character stream at all, and in Java 1.1, you would have to use public Reader getEntityCharStream (String systemID) throws Exception; > > This seems like a generally good idea (as will as a simple and > > backwards-compatible change), and I am willing to implement it. > > The only complication is that we'll have to define the default > > state -- is the parser always required to return a default handler > > if the user has not explicitly set one, or should it return null? > > It would be up to the SAX implementation. It might provide default > implementation depending on configuration. For example, FooSaxDriver might > have setInputType() method which would install a default EntityHandler for > fetching XML document from a database. This might make life a little trickier for programmers using SAX -- what do others think? > BTW, You left out my other suggestion which was > > >>>>>>>>>>>>>>>>>>>>>>>> > In addition, I would like to have following two methods added to the Parser > API for driver-specific operations: > > public Object getDriverProperty(String name); > public Object setDriverProperty(String name, Object value); > > Property names should be prefixed with some unique values to avoid confusing > other drivers. Note that above methods can be invoked without knowing which > driver is actually being used. For example: > > parser.setDriverProperty("SuperDriver.lowercaseElements", Boolean.TRUE); > parser.setDriverProperty("HungryDriver.cacheSize", new Integer(100000)); > <<<<<<<<<<<<<<<<<<<<<<<< > > Above two methods allow driver-specific code without actually having to > import anything. Sorry about the omission. I'd be interested in hearing other reactions to this suggestion -- I'm worried that it would result in SAX implementations that are non-conformant XML processors (as in your first example), or that are incompatible with each other. Remember that SAX defines only a minimum level of compatibility among XML processors. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Mon Feb 2 21:52:23 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests References: <199802012028.PAA00747@unready.microstar.com> <34D4EE00.A1FCECF5@infinet.com> <199802022050.PAA01517@unready.microstar.com> Message-ID: <34D63FDE.2D234CFC@infinet.com> David Megginson wrote: > Tyler Baker writes: > > [on reading XML from a stream rather than a URI] > > > Well, what if the XML data is streamed from a database where a URL > > does not matter so much. If you look at what Oracle, Sybase, and > > Microsoft among others are planning on doing with XML, then > > supporting this with SAX in the most ubiquitous way will be very > > much necessary. I think that if you want to make SAX have any > > CORBA support or other language support down the line, it would be > > best to negate any polymorphism in the API cause in CORBA for > > example, you cannot redefine operations in IDL (methods in Java). > > This is a good point, but there are complications. Do these vendors > plan to use character streams or byte streams? In CORBA IDL there is a string and a wstring type. The wstring type maps to Unicode in the IDL -> Java mapping. You could define everything as wstring if you wish as far as IDL is concerned. > > Another idea (as far as implementation goes) is to have the parser > > simply be an extension of java.io.FilterInputStream which takes an > > one or more Handler interfaces as arguments (to delegate to), so > > that you can handle very large streams of data. > > This sounds like an interesting idea for a parser implementation, but > since SAX is meant to work with many parsers in many languages, it is > probably too constraining as a general common interface. Yah I only meant as for the implementation, but on another note, I think that the Handler interfaces are by far and away the most important ones. Really, if Aelfred had an XMLInputStream which could be derived out of Parser either by having the parser be an implementation of XMLInputStream itself, or else assigning a parser stub to XMLInputStream which could be retrieved by calling, Parser.getXMLInputStream(). Parser.parse() would just parse everything with no control over IO, but with XMLInputStream you could have control at the IO level Furthermore, having a handler registry of SAX Handler interfaces (or just pointers to where the class implementations live) would be invaluable to the particular application I am working on now. I suggested having a static registerHandler method in XMLInputStream, but you could add this to Parser instead. This way you could simply pass in XML data and the parser would look up the appropriate handler implementation for that doctype and load it dynamically. Otherwise, this needs to be done manually and can really bloat your code at the application level since you will have to essentially have a large number of if/else statements and register the appropriate handlers manually. If this was implemented in Aelfred or any other parser, you would already remove a huge burden off of the application developers utilizing XML IMHO. > [on get* methods for handlers] > > > Not sure exactly what the use of these get methods is for cause all > > the handlers are useful is delegation anyways. The only reason the > > get methods would be useful is for casting the returned object to > > some other form. Why anyone would need to do this is beyond me as > > recasting this object back to something would be sloppy > > implementation in the first place. > > Delegation itself might be enough justification, though -- we'll have > to wait and see what others suggest. I think it would be better to have an addDocumentHandler() instead of setDocumentHandler() if you wish to do delegation. This is an Observer/Observable pattern that would work quite nicely. You could have multiple objects register interest in the parsing of the XML data and have the events delivered to them appropriately. You might even make all of this beans compliant if you really want to. > > The default handler could just be something which spits stuff out > > to stdout or some other OutputStream in a manner similiar to how > > Aelfred's EventDemo does. > > It would probably be best for the default handler to produce no output > at all, so that other handlers delegating to it would not end up > creating bloated log files. Yah, I kinda overlooked this. I just thought it would be nice for debugging. My stupid (-: Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Feb 2 22:23:17 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:03 2004 Subject: First experiences with XSL In-Reply-To: <01bd2d90$7dc5d6a0$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: On Fri, 30 Jan 1998, Michael Kay wrote: > I've downloaded MSXSL and used it to generate HTML for a couple of document > types, successfully but with a certain amount of frustration caused by (a) > lack of diagnostics when I got things wrong, and (b) limited functionality. > > I've now implemented the same thing without XSL: I wrote an MSXML > application in Java that does a recursive walk down the document tree and > calls a registered "handler" class to process each element type. Yes, you can implement something XSLish without XSL. The point of XSL is that it is to be a standard: there will be multiple, interoperable browser and word processor implementations as well as dedicated XSL development tools and so forth. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From oshima at osa.sci.jri.co.jp Tue Feb 3 05:30:09 1998 From: oshima at osa.sci.jri.co.jp (Tetsuya OSHIMA) Date: Mon Jun 7 17:00:03 2004 Subject: No subject Message-ID: <9802030238.AA13691@t111ws06> # bye xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Tue Feb 3 10:58:02 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests Message-ID: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk> >Tyler Baker writes: > > [on reading XML from a stream rather than a URI] > > > Well, what if the XML data is streamed from a database where a URL > > does not matter so much... This suggests an analogy with CGI. A URL is not the name of a document, it is a request for a stream of data, and what we need is a style of URL (or extended URL) that allows the application to say "please send your requests for data to me and I will supply a stream in response". >This is a good point, but there are complications. Do these vendors >plan to use character streams or byte streams? > I don't know the Java technicalities, but surely what we mean by a stream here is something that supplies a sequence of Unicode characters. (Surely it's not the parser's job to turn bytes into characters?) We should also ensure that the design makes certain special cases easy for the application writer, e.g.: a) the primary input source is a file in filestore. (Translating the filename to a URL is error-prone and it would be better for the parser to do it) b) there is only one input source (e.g. a record containing XML read from a database, with no DTD or other external entities), probably available already in the application as the contents of a String. regards, Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 3 12:00:17 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:03 2004 Subject: SAX: Parser Interface -- Summary of Change Requests In-Reply-To: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk> References: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <199802031155.GAA00333@unready.microstar.com> Michael Kay writes: > I don't know the Java technicalities, but surely what we mean by a stream > here is something that supplies a sequence of Unicode characters. (Surely > it's > not the parser's job to turn bytes into characters?) That depends on the type of stream. I would not want to force the client to do encoding conversion for a stream that happened to be open to a local file or an HTTP connection. > We should also ensure that the design makes certain special cases easy for > the application writer, e.g.: > > a) the primary input source is a file in filestore. (Translating the > filename to a URL is error-prone and it would be better for the parser to do > it) > > b) there is only one input source (e.g. a record containing XML read from a > database, with no DTD or other external entities), probably available > already in the application as the contents of a String. It should be possible to read from a string, but it would not be safe to assume that the string contains no DTD or external entities -- it would always be necessary to supply a base URI as well. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From thyde-smith at derwent.co.uk Tue Feb 3 12:19:04 1998 From: thyde-smith at derwent.co.uk (thyde-smith@derwent.co.uk) Date: Mon Jun 7 17:00:03 2004 Subject: UNSUBSCRIBE Message-ID: <00010D05.1271@derwent.co.uk> unsubscribe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From nav at metratech.com Tue Feb 3 16:02:32 1998 From: nav at metratech.com (Navdip Bhachech) Date: Mon Jun 7 17:00:03 2004 Subject: recommendations on currently available streaming XML toolkits? Message-ID: <01BD3093.3C881940.nav@metratech.com> there have been a few discussions on streaming issues in this list lately, so I thought I'd ask: What are the recommended toolkits (currently available) that allow streaming XML, instead of a file based approach? Nav ______________________________________________________________ Navdip Bhachech MetraTech Corp www.MetraTech.com nav@metratech.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Tue Feb 3 16:51:48 1998 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 17:00:04 2004 Subject: recommendations on currently available streaming XML toolkits? In-Reply-To: Navdip Bhachech's message of Tue, 3 Feb 1998 11:02:35 -0500 References: <01BD3093.3C881940.nav@metratech.com> Message-ID: Our XML tools are designed for streaming, and are happy with multi-10M documents: http://www.ltg.ed.ac.uk/software/xml/ ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Tue Feb 3 22:29:04 1998 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 17:00:04 2004 Subject: Ideas about Cutting and Pasting in XML Message-ID: <199802032238.JAA18643@jawa.chilli.net.au> Developers with an idle moment may be interested in a paper I've just put up "A Cut and Paste Infrastructure for XML" http://www.chilli.net.au/~ricko/XML-cut-n-paste.htm It gives a direction I suggest XML needs to be developed towards, in order to support arbitrary cutting and pasting between XML documents. This now has some comments about RDF (and XML-data) which may be of interest too. Comments welcome. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bsteele at tdiinc.com Tue Feb 3 22:36:23 1998 From: bsteele at tdiinc.com (Bob Steele) Date: Mon Jun 7 17:00:04 2004 Subject: XML-Data: A naive question Message-ID: <34D79CFD.4F0469F@tdiinc.com> RDF documentation (Resource Description Framework (RDF) Model and Syntax) states: "RDF uses the Extensible Markup Language (XML) encoding as its syntax. However, RDF will not require (and conforming implementations must not require) an XML Document Type Declaration for the contents of assertions. In this respect RDF requires at most the XML well-formedness constraints. RDF schemas may ? but are not required to ? be XML DTDs." Isn't this true of XML-Data? I can't seem to find it expressly stated. Thanks, bob -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 3 23:07:45 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:04 2004 Subject: XML Conformance and DTD support in SAX Message-ID: <003e01bd30f7$e0e6a830$2ee044c6@donpark> 1. XML Conformance I am not sure if I am going off in a tangent but I think some form of markup to indicate XML conformance would be really nice so that XML clients and servers can decide whether to validate or not. 2. It would be nice to have SAX provide more DTD information. We could either have a separate DocumentTypeHandler or fire XML parsing events for DTD as if it was an XML document being parsed. Anyway, without better support for DTD, DOM can be supported fully by SAX. Perhaps we need SAXDTD API to augment SAX? No lines drawn, just digging some sand with my toes, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 4 00:35:18 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:04 2004 Subject: XML Conformance and DTD support in SAX In-Reply-To: <003e01bd30f7$e0e6a830$2ee044c6@donpark> References: <003e01bd30f7$e0e6a830$2ee044c6@donpark> Message-ID: <199802040029.TAA00528@unready.microstar.com> Don Park writes: > 2. It would be nice to have SAX provide more DTD information. > > We could either have a separate DocumentTypeHandler or fire XML parsing > events for DTD as if it was an XML document being parsed. Anyway, without > better support for DTD, DOM can be supported fully by SAX. Perhaps we need > SAXDTD API to augment SAX? I think that it is very likely that we will make a SAX level two some other day, which might include a DocumentHandler and/or a DTDHandler interface. For now, however, we should probably try to stabilise what we have -- the current SAX falls mostly within the range of features already offered by existing parsers. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Feb 4 08:50:48 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:04 2004 Subject: Namespaces, modules and architectures paper available Message-ID: <34D82C2E.6C6B3AE7@technologist.com> http://itrc.uwaterloo.ca/~papresco/sgml/namespaces.html Why We Need Namespaces (Modules) An SGML/XML Feature Proposal Abstract The World Wide Web Consortium has recently published a note called Namespaces in XML. Not everyone has access to it yet, but they will soon. It proposes a simple convention for allowing instances to have elements whose type names come from many different schemas. According to that note: "We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas. Another is the advantage of allowing search engines or other tools to operate over a range of documents that vary in many respects but use common names for common element types." Advocates of ISO architectural forms ("archforms") have noticed that these requirements are very similar to those for archforms and have proposed archforms as a solution. They are correct that the basic underlying problems are related, but the problems are not identical. We need both archforms and namespaces. The two ideas are actually very complementary. This note demonstrates why neither architectural forms nor the current namespace proposal really solve the "namespace problem" satisfactorily. Background I will use the document [1]'A Proposal to Introduce "Module" Structures Into SGML' as an example of a modules proposal which includes not just a convention for namespace combination, but a syntax for actually combining SGML DTD fragments. These fragments are the only standardized schema for either SGML or XML. Architectural forms allow a "client document" to declare that certain elements conform to an element type in a DTD other than the document's DTD. For instance you could say that a particular element is both a LINK element in the document's DTD and a HyTime CLINK element in the HyTime architecture. It is essentially both things at once. You can either declare a particular element as having an architectural element type (in addition to its ordinary element type) or you can declare that all of the elements of a particular type adhere to a particular architectural element type. For instance you could say that a particular "human" element conforms to the "animal" architectural element type (if the human was, for example, a "party animal") or you could say that all "dog" elements conform to the "animal" architectural element type. The Rub A particular element can also conform to multiple architectural element types. For instance the afore mentioned human could conform to both the "programmer" and the "party animal" architectural element types (no, those are not logically exclusive). My claim is that this increased generality is a powerful feature in many contexts, but makes things way too complex in the simple case for architectural forms to be the most basic namespace management facility in XML. SGML and SGML tools are organized around the idea that each element conforms to one and only one element type. We have not yet re-thought the SGML processing idea in terms of the concept of multiple element types. For instance, the most common form of SGML processing is validation. SGML uses DTDs to define constraints on SGML documents. According to the Japanese proposal, validation could be accomplished less like this: ]> Imagine that math.module.dtd and hyperlinks.module.dtd are hundreds of lines long. Imagine also that they both had an element called "SET" (for "mathematical set" and "link set"). As far as I know, there is no way to accomplish this namespace merging operation with anything close to the same ease with architectural forms. Yes, I can do it, by copying math.module.dtd and hyperlinks.module.dtd into my document type. I can then manually fix up the namespace clashes like my "SET" element. But it is this sort of duplication of code that the modules proposal was explicity designed to avoid. In fact, that is it's reason for existing. We can see, then, that architectural forms do not solve the problem that the modules proposal was meant to solve. They do not automatically merge namespaces. Let me define some terms to clarify. A namespace is a mapping from names to objects, such as element type names to element types (explicitly or implicitly declared). A namespace merge is the construction of a namespace from two others that preserve all of the elements from the originals. Architectural forms provide access to multiple namespaces, but they do not merge namespaces. I suspect that some with a long background in SGML will be a little baffled trying to understand why someone would want to do this. After all, combining document types is typically difficult work performed by experts, tested on teams of users, tweaked to perfection with element names remapped to fit the terminology of the user community. Mixing and matching DTD fragments in an ad hoc manner might not seem like a good idea. But the fact is that we live in a brave new world. End users want to take control of their own document types in many cases. They want to mix and match DTD fragments and they are not willing to spend the amount of effort that we professionals are. Good for them! They will make all of our lives easier. In fact, when authors say that they want to "get rid of" DTDs, what they typically mean is that they don't want to be constrained by someone else's DTD and making their own is too difficult! If we can make DTD maintenance easier, more people will use them. Perhaps it would be possible update SGML that validation does not depend so deeply on each element having a single element type, so that content models could be expressed that combined elements from different architectures. If we did that, my complaint might go away. Architectures might regain some of the validatory simplicity of the modules proposal. But this would require a much more fundamental change to SGML than the modules proposal would. Stylesheets I will use stylesheets as another example of processing. The three most interesting stylesheet languages right now are DSSSL, XSL and CSS. Each of those has as its central organizing construct a rule triggered on an element type name in a context. DSSSL has a feature that would allow querying on architecture, but the feature is optional and is not supported, for instance, by James Clark's Jade. Even where the feature is available, the architectural form-based version of a stylesheet is much more complicated than the equivalent based on a "flat" namespace (such as a stylesheet for tradition SGML or SGML augmented with the modules proposal). I invite architectural forms advocates to prove me wrong by providing their stylesheets. Here is what a module-enhanced DSSSL might look like: (element MATH.AND.HYPERLINKS (process-children)) As you can see, this has just enough lines to include the relevant stylesheet modules and provide rules for the new elements. What would the equivalent archform code look like? With DSSSL as it exists, it would look quite ugly and convoluted. With some enhanced DSSSL it might look reasonable (just as some enhanced SGML might be able to have content models that span architectures), but nobody has yet proposed what such a DSSSL would look like (just as nobody has proposed the enhanced SGML). I am open to suggestions... I do not believe that either the current XSL proposal or CSS would allow architecture based processing at all. Once again, the idea that every element has a single element type is a fundamental organizing principle of these stylesheet languages. It is also an organizing principle of most SGML editors, DTD editors and formatting and conversion tools I have used. In fact, almost every SGML tool in the world operates under that principle. The best tools will give you access to architectural forms (through their architectural attributes), but they will typically use the element type name as the major organizing feature of the stylesheets. Archform centric processing is typically awkward if it is possible at all. The one element, one elment type principle is also central to every course in SGML I have ever taken and any book on it I have ever read. Even the SGML Handbook says that every element has a particular element type (a single, particular element type). The Argument From Usability Imagine that you are a typical end user and have used archforms instead of a namespace merging mechanism to combine DTD fragments. Now imagine that you know that a particular element type name appears in both DTD fragments. I think that most people would be very surprised to learn that the way to associate this element with one or the other DTD is to add an attribute. Because the generic identifier (the name in the start-tag) usually establishes the element type, you would probably expect to change the generic identifier to change the association. But using architectural forms, you would actually rather have to add an attribute that would essentially disassociate the element with one of the element types: "I may have the same name as that element type, but it isn't actually one of my element types." I think that this is a nasty case of making the common, simple case of merging DTD fragments more complicated in order to make life easier for those of us who have to solve problems that may actually require the full generality of architectural forms. Once again, I invite advocates to send me code samples that demonstrate that this is simpler than I think. Who was it that said: "Make the easy things easy and the hard things possible." Architectural forms make hard things possible, but when misapplied to the namespace problem, they make easy things unnecessarily hard. Le me be clear: architectural forms (or something like them) have an important role to play in SGML systems. We absolutely need some form of semantic inheritance mechanism. But they work best when they work in the environment they were designed for: they are typically used as an underlying basis of a DTD designed by a professional. The professional DTD designer renames elements to avoid clashes. That individual is the real solution to the "namespace problem" in most environments. In environments where such a person exists, archforms are really, really useful. They are not useful because they allow you to merge namespaces (they don't). They are useful because they allow you to combine semantics from different DTD fragments in powerful ways (but more or less manually). I think that a modules/namespaces proposal would acutally be very useful for building architectures from DTD fragments. I also think that architectural forms would be very useful on the Web. Not every use of XML on the web will be ad hoc. Some XML applications will need the robust multi-level validation that architectural forms allow. Think about e-commerce for example. But many users will not need or want architectural forms. Most people just need a simple way to combine fixed DTD fragments so that there are no name clashes. The Japanese module proposal provides such a mechanism. Presumably Web-centric DTD-replacement schema languages will provide mechanisms like this also. If these sorts of things are made much easier in these schema languages than they are in SGML DTD syntax, people will just avoid SGML DTD syntax. This would be a big mistake for all concerned. Let's please just fix SGML through a proposal like the one submitted by the Japanese in 1996. Some modules proposal should be part of the SGML revision. This would in no way preclude the wide deployment of architectural forms as a solution to a different problem. Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Feb 4 14:36:48 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:04 2004 Subject: Namespaces, modules and architectures paper available References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> Message-ID: <34D87D02.14BA4B9C@technologist.com> I appreciate the simplicity of this [1]proposal, but want to check that it is not too simple to get the job done. How would you pass information into a module with this proposal? For instance, I might want to include a table model, but might need to specify the contents of the table's cell elements from the containing DTD. Also, it feels "nicer" to me to have the instance structure control namespace lookup so that when I am in a MATH::FORMULA element, I can use elements from the MATH module without qualification. This convention could remove most or all qualification from a document instance and thus make things simpler for authors. For instance: %math; ]> ... I would like it if the containing element would control namescope choice. Paul Prescod -- http://itrc.uwaterloo.ca/~papresco [1] It should appear here soon: http://www.lists.ic.ac.uk/archives/xml-dev/9802/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 4 15:07:03 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:04 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <34D87D02.14BA4B9C@technologist.com> References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> Message-ID: <199802041506.KAA00956@unready.microstar.com> It seems to me that when you want to embed large contiguous structures from different document types in an XML document, each different namespace should be its own sub-document, referenced as a binary entity (or using whatever other mechanisms are available in XML-Link). Good tools and protocols should make it possible to create, transmit, and process compound documents as if they were single files. This will be necessary anyway for supporting multimedia. Here are some general guidelines: * Architectural forms are most suitable for applications where multiple inheritance is required, or where elements belonging to a different document type are scattered throughout a document. * Sub-documents are most suitable for applications where all of the element belonging to a different document type are rooted in a single subtree. "namespace:gi" element type names are unsuitable for several reasons: 1) The complexity of namespaces is exposed to the author rather than hidden in the DTD (as it is, optionally, with architectural forms). 2) Multiple inheritance is not possible (X can be a kind of Y or a kind of Z, but not both). 3) Standard DTD-based validation is not possible, and it is more difficult to create DTD-driven authoring tools. 4) Both architectural forms and sub-documents can be fully supported under the existing spec by _both_ validating and non-validating XML parsers: no changes necessary. Furthermore, they will also remain compatible with SGML tools. Why are people worried about writing specs to solve a problem that already has good, working, available solutions? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From grk at arlut.utexas.edu Wed Feb 4 15:59:35 1998 From: grk at arlut.utexas.edu (Glenn R. Kronschnabl) Date: Mon Jun 7 17:00:04 2004 Subject: FORTRAN namelist input - remember? Replace with XML! Message-ID: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu> I want to use XML as a general input mechanism for scientific programs. In the old days, say in FORTRAN, one used to use namelist input. In C/C++, one usually wrote a custom driver. I want to use XML because it appears to make sense. I have started using SP - and want to build a tree that I can query (kind of like an xrdb interface) for my input parameters. But, before I embark on this, I was wondering if 1) this makes sense, 2) someone surely has a simple tree builder/query interface to SP already that I can use so I don't have to write my own (none jumped out at me when I looked around). Thanks. Cheers, Glenn -------------------- Glenn R. Kronschnabl Applied Research Laboratories | grk@arlut.utexas.edu (PGP/MIME ok) The University of Texas at Austin | http://www.arlut.utexas.edu/~grk PO Box 8029, Austin, TX 78713-8029 | (Ph) 512.835.3642 (FAX) 512.835.3808 10,000 Burnet Road, Austin, TX 78758 | ... but an Aggie at heart! xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Wed Feb 4 16:29:11 1998 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 17:00:04 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <199802041506.KAA00956@unready.microstar.com> (message from David Megginson on Wed, 4 Feb 1998 10:06:58 -0500) Message-ID: <199802041632.LAA14809@geode.ora.com> [David Megginson] > "namespace:gi" element type names are unsuitable for several reasons: [...] > Why are people worried about writing specs to solve a problem that > already has good, working, available solutions? The problem (as I see it) is not one of including pieces of existing documents, nor of structural validation. The main reason for namespaces is semantic inheritance. I want to write a scientific research paper quickly. HTML has the overall document structure and components that I need; MathML has equations; CML has chemical formulæ. I should be able to say that I'm using those things, associate stylesheets, and have my browser know that should be styled with the "a" rule from the HTML stylesheet. It should be *possible* to create a DTD to which such a document complies, but I am not as interested in automatic validation of a namespace document. The interrelational issues are, I think, too complex to solve; in the example above, I would need to change the text-containing HTML elements' content models to include chemical and mathematical markup, and maybe allow HTML markup in MathML theorems. Pushing selected information into the content models is too ugly. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 4 17:34:27 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:04 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <199802041632.LAA14809@geode.ora.com> References: <199802041506.KAA00956@unready.microstar.com> <199802041632.LAA14809@geode.ora.com> Message-ID: <199802041733.MAA02120@unready.microstar.com> Chris Maden writes: > The problem (as I see it) is not one of including pieces of existing > documents, nor of structural validation. The main reason for > namespaces is semantic inheritance. I want to write a scientific > research paper quickly. HTML has the overall document structure and > components that I need; MathML has equations; CML has chemical > formul?. I should be able to say that I'm using those things, > associate stylesheets, and have my browser know that should > be styled with the "a" rule from the HTML stylesheet. It seems to me simpler to create a compound document rather than to try to force everything into a single XML document -- you can reference another XML document the same way that you can include a graphic or audio sequence. Managing a lot of small objects directly on the file system can be tricky, but it's trivial with proper tool support (think of OLE under Windows, despite its warts) > It should be *possible* to create a DTD to which such a document > complies, but I am not as interested in automatic validation of a > namespace document. The interrelational issues are, I think, too > complex to solve; in the example above, I would need to change the > text-containing HTML elements' content models to include chemical and > mathematical markup, and maybe allow HTML markup in MathML theorems. > Pushing selected information into the content models is too ugly. Not at all -- you just need a single element type to hold references to other XML documents. You could even (though this is disgusting) use All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Wed Feb 4 18:01:44 1998 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 17:00:04 2004 Subject: [AElfred] Problem: '"' in CDATA attribute Message-ID: <2.2.32.19980203180236.0095ec44@dream.paragraph.com> AElfred distribution from 19980112. Problem: com.microstar.xml.XmlProcessor.error() reports error when parsing attribute declared in DTD as CDATA and containing '"' in its value, such as "#text". On the other hand com.microstar.sax.AElfredDriver from the same 19980112 distribution handles attribute definition corectly and doesn' report such an error. Dima --------------------------- dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Feb 4 18:05:50 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> Message-ID: <34D8AE13.610ABC07@technologist.com> David Megginson wrote: > > It seems to me that when you want to embed large contiguous structures > from different document types in an XML document, each different > namespace should be its own sub-document, referenced as a binary > entity (or using whatever other mechanisms are available in XML-Link). > > Good tools and protocols should make it possible to create, transmit, > and process compound documents as if they were single files. This > will be necessary anyway for supporting multimedia. *MAKE EASY THINGS EASY* Making my five-line formula into a different document with a different document type is *not easy*. It is a royal pain in the butt, which is why almost nobody does it. I have seen the CALS table model merged with dozens of DTDs and have never once seen someone take the opposite approach of making CALS tables "subdocuments." We can imagine a theoretical universe in which the tools are so good that this is easy, but if we are imaginative in this way, we can paper over any design flaw in SGML or XML with the claim that "the tools can handle it." If XML or SGML were designed to be manipulated only through tools, that would be acceptable. But they were not...they were designed to be written in text editors and surprising enough, a huge number of people do that. > Here are some general guidelines: > > * Architectural forms are most suitable for applications where > multiple inheritance is required, or where elements belonging to a > different document type are scattered throughout a document. I agree with the former. I don't with the latter. A simple modules proposal handles the latter nicely. > * Sub-documents are most suitable for applications where all of the > element belonging to a different document type are rooted in a > single subtree. Subdocuments have many problems including * typing convenience (seperate files...yuck) * element type constrainability (how do I specify a SUBDOC root element type in a content model?) * "content model communication" (how do I pass a %cell; content model into my table subdoc) * modularity (subdocs must be declared at the top of the document, an annoying non-local maintenance issue) * ID linkage (even for simple links I must use some more advanced linking strategy) * semantics (i.e. SUBDOC has none...you need VALUEREF or something else on top of subdoc) That does not mean that they are never useful. There are some hard problems where they are very useful. But for the *simple problem* of embedding MATH in HTML (for example) they are overkill, as are architectural forms. *KEEP SIMPLE THINGS SIMPLE* > "namespace:gi" element type names are unsuitable for several reasons: > > 1) The complexity of namespaces is exposed to the author rather than > hidden in the DTD (as it is, optionally, with architectural forms). As my paper pointed out, we now live in a universe where the person creating the DTD is often the author. You live in a world where people pay you to hide things in DTDs. Most of the people on the Web don't have a David Megginson or a Paul Prescod to do that for them. Their problems are still real. > 2) Multiple inheritance is not possible (X can be a kind of Y or a > kind of Z, but not both). Many people do not want multiple inheritance and as my paper pointed out, it makes some problems much more difficult to understand and solve. > 3) Standard DTD-based validation is not possible, and it is more > difficult to create DTD-driven authoring tools. I think you are totally wrong here. As a programmer, I could implement modules in an SGML editor in MUCH less time than it would take me to implement architectural forms. > 4) Both architectural forms and sub-documents can be fully supported > under the existing spec by _both_ validating and non-validating XML > parsers: no changes necessary. Furthermore, they will also remain > compatible with SGML tools. That's great for today. But for tomorrow, ISO has already undertaken to change SGML. Do you propose that they should not add anything to SGML that is not compatible with existing tools? My position is that the very point of a revision is to make things easier and more powerful and that this is thus the perfect opportunity to make this common problem easier to solve, even if it breaks some old tools. > Why are people worried about writing specs to solve a problem that > already has good, working, available solutions? Because the good, working solutions are solutions to much harder problems and make simple jobs needlessly difficult. Paul "SIMPLE THINGS SIMPLE" Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Feb 4 18:17:33 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents References: <199802041632.LAA14809@geode.ora.com> Message-ID: <34D8B09A.BBE21DA9@technologist.com> Chris Maden wrote: > > [David Megginson] > > "namespace:gi" element type names are unsuitable for several reasons: > > [...] > > > Why are people worried about writing specs to solve a problem that > > already has good, working, available solutions? > > The problem (as I see it) is not one of including pieces of existing > documents, nor of structural validation. The main reason for > namespaces is semantic inheritance. Architectural forms give you that. > I want to write a scientific research paper quickly. The key word here is *quickly*. Architectural forms don't give you that. > It should be *possible* to create a DTD to which such a document > complies, but I am not as interested in automatic validation of a > namespace document. The interrelational issues are, I think, too > complex to solve; in the example above, I would need to change the > text-containing HTML elements' content models to include chemical and > mathematical markup, and maybe allow HTML markup in MathML theorems. > Pushing selected information into the content models is too ugly. These issues are not complex at all. They are all handled nicely by the Japanese proposal. In a "modular world", HTML would become a module that takes parameters such as "object-types", "character span types", "block types" and so forth. You pass in "MathML::Formula" as an "object-type" and the HTML %figure-type; entity gets updated to reflect it. The issue is only complex in the example you site because HTML was not designed to be modular because SGML does not have a concept of DTD modules. Even so, this is already dirt-common in SGML applications that don't even *have* modules. You define a parameter entity and include the entity. " > > > ]> > > As with parameter passing, scoping declarations, if desirable, will be desirable > with or without modules. After thinking this through, I am a little disturbed by the proposal above. To me, it implies a deep-ish changes to the SGML processing model that a module/namespace proposal does not. Consider that in a module/namespace proposal, every element type has a single, fully qualified name. Unqualified references are merely "short form references" (not to be confused with "short references") -- they are a short form for the full thing. Going from an unqualified instance to a fully-qualified one is a purely syntactic operation. But I'm not sure how I would refer to elements in the scheme above. Let's say I am writing a stylesheet. How do I differentiate betwen [1]"FOO"s with "BAR" parentage and [2]elements conforming to the element type "FOO" that can only exist in "BAR". [1] ... Here all FOOs refer to the same element type. [2] ]> ... Here all FOOs refer to different element types. To me, there is a subtle but important difference. A scoped namespaces proposal makes SGML (more) context dependent at the *syntactic* level, but a scoped declarations proposal makes it context dependent at the *semantic* level. There exists no "context free" expansion. I don't yet know if this will cause Bad Side Effects. But right now I can't yet imagine many uses for this feature *other than* the kind of element type namespace scoping that could be accomplished completely in a modules proposal. If there are no other important uses for this feature then I would rather stick with the more strictly syntactic module structure and leave this contextual declaration stuff out. But maybe there are important uses for this that I have not considered. Note that I can *totally* imagine why you would want to scope an entity declaration or notation declaration to an element, but not to an element type. I think that the former should be a high priority, but don't really understand the need for the latter. Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 4 22:52:40 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <34D8AE13.610ABC07@technologist.com> References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> Message-ID: <199802042253.RAA00485@unready.microstar.com> Paul Prescod writes: > *MAKE EASY THINGS EASY* > > Making my five-line formula into a different document with a different > document type is *not easy*. It is a royal pain in the butt, which is > why almost nobody does it. I have seen the CALS table model merged with > dozens of DTDs and have never once seen someone take the opposite > approach of making CALS tables "subdocuments." You have stated a good, general rule of thumb; in this case, however, it is important to remember that a central component of simplicity is consistency (by the way, I _have_ seen CALS tables as SGML subdocuments, but one of my dreams in XML is never to hear the words "CALS table model" again). XML documents may (and perhaps, usually will) contain non-XML objects such as wordprocessor documents, spreadsheets, MPEG clips, Java applets, audio sequences, and many others -- to date, thankfully, no one has proposed uuencoding any these and dumping them inline between a start and and tag. Why should we treat an equation marked up in XML differently than an equation marked up in Microsoft Word? It seems easier (from a user's perspective) to treat everything as objects, rather than defining one special case. Object-oriented programming has proven the value of encapsulation, and the compound-document idiom is standard on millions of desktops already, so we can hardly argue that subdocuments are an unfamiliar approach. I am a big fan of pragmatism on the implementation side, as people might have noticed from my postings on the design of AElfred; on the standards side, though, I wouldn't want to cripple a spec just to work around a temporary problem that will have to be solved anyway for non-XML objects. SGML people will remember unfortunate features like SHORTREF, DATATAG, and OMITTAG -- included a little over a decade ago, likewise, for the sake of making things easy and working around temporary deficiencies in the available tools. XML is popular mainly because it has finally banned all of these. > Subdocuments have many problems including > * typing convenience (seperate files...yuck) (See comments above). > * element type constrainability (how do I specify a SUBDOC root element > type in a content model?) Use HyTime (just joking). Seriously, I cannot see that this is a worse case than not being able to use a DTD at all. The general idea of compound documents (Netscape with plug-ins, OLE documents, Andrew documents, or otherwise) is that you can plug in any object -- I had imagined that this was the goal of namespaces as well. In XML you can constrain the placement of pointers to external objects, at least. > * "content model communication" (how do I pass a %cell; content model > into my table subdoc) You're thinking of CALS here. I'd suggest that we move away from the older SGML model of heavily parameterised DTDs (as from heavily #IFDEF'ed C header files): remember that one of the arguments for the namespace model is to reuse stylesheets and other processing specifications -- if a table model can vary its content unpredictably, then you will not be able to reuse stylesheets anyway. Again, encapsulation is a big win, and it keeps things easy. That said, if you _really_ need to pass a %cell; content model to a subdocument, you can always include the same file of entity declarations in both the parent and the child. I'd recommend against it, but it's possible if you want to do it. > * modularity (subdocs must be declared at the top of the document, an > annoying non-local maintenance issue) Only if you use an entity/notation mechanism. You could just as easily use a URL/MIME approach: The question of how to include external objects is a separate debate, and subdocuments can swing easily from either vine. > * ID linkage (even for simple links I must use some more advanced > linking strategy) HREFs would work fine -- HTML people are already used to so we should have no confusion here. Furthermore, you have the advantage that your document's validity does not depend on its child objects (this is very important for document management in large, multi-author systems -- if subdocuments are atomic, then a change by one author to a table, for example, will not make the containing chapter invalid). Again, as in programming, encapsulation will be a big win in the medium term. > * semantics (i.e. SUBDOC has none...you need VALUEREF or something else > on top of subdoc) I expect that XLL will provide mechanisms for expressing the 'embed' semantic. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 4 22:57:22 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <01bd318f$59cf8ae0$LocalHost@sgml> References: <01bd318f$59cf8ae0$LocalHost@sgml> Message-ID: <199802042257.RAA00504@unready.microstar.com> Martin Bryan writes: > Unfortunately subdocs are not supported in XML, or in many SGML > tools. Sorry for any confusion here -- I'm talking about subdocuments in general, not about the SGML SUBDOC feature. You can include a subdocument using an NDATA entity, or simply by providing a URI in an attribute value. I'm certain that XLL will have something useful to say here. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Thu Feb 5 00:16:44 1998 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <98Feb4.175315est.18819@thicket.arbortext.com> References: <34D8AE13.610ABC07@technologist.com> <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> Message-ID: <3.0.5.32.19980204191534.009a4bc0@village.doctools.com> This exchange is fascinating. One comment: At 05:53 PM 2/4/98 -0500, David Megginson wrote: >Paul Prescod writes: > > * "content model communication" (how do I pass a %cell; content model > > into my table subdoc) > >You're thinking of CALS here. I'd suggest that we move away from the >older SGML model of heavily parameterised DTDs (as from heavily >#IFDEF'ed C header files): remember that one of the arguments for the >namespace model is to reuse stylesheets and other processing >specifications -- if a table model can vary its content unpredictably, >then you will not be able to reuse stylesheets anyway. Again, >encapsulation is a big win, and it keeps things easy. I don't think the problem has anything to do with CALS. In fact, until SGML Open came along, it was pretty hard to use the CALS table model as a module -- it was not designed with this use in mind, and its inflexibility resulted in dozens or hundreds of DTDs recoding the whole thing just to change a few features. Table models, even if they're not CALS, are going to vary their content unpredictably, because cells typically need to contain markup *inside* them that is specific to the information domain *outside* the table structure; they're surrounded coming and going. (As an aside, I don't think this means you can't reuse stylesheets; you just sequester the table geometry stuff from the cell formatting and recode just a little bit of element-in-context stylesheet code.) Table cells are a common boundary case of namespace mixing from the text world, and perhaps there are similar situations in the data world. I think that a black-box approach (subdocuments) would require way more overhead than a unified-model approach in doing "content model communication." Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Feb 5 00:18:04 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> <199802042253.RAA00485@unready.microstar.com> Message-ID: <34D8FD5C.7445D1AE@technologist.com> David Megginson wrote: > > XML documents may (and perhaps, usually will) contain non-XML objects > such as wordprocessor documents, spreadsheets, MPEG clips, Java > applets, audio sequences, and many others -- to date, thankfully, no > one has proposed uuencoding any these and dumping them inline between > a start and and tag. Maybe not on this mailing list, but come on over to "SGML-TOOLS" (formerly LinuxDoc). :) :) > Why should we treat an equation marked up in XML differently than an > equation marked up in Microsoft Word? It seems easier (from a user's > perspective) to treat everything as objects, rather than defining one > special case. We should treat them differently for two reasons: #1. XML data is text, and thus makes a certain amount of "sense" inline. If I embedded LaTeX in an XML document I would probably inline it, rather than refer to it for the same reason. Word formuale are binary. #2. XML has concepts such as validation and id-reference that depend on data being logically inline. #3. If we do not do this, I do not think that people will use subdocs. They will probably just abandon validation or use XML-Data. > Object-oriented programming has proven the value of > encapsulation, and the compound-document idiom is standard on millions > of desktops already, so we can hardly argue that subdocuments are an > unfamiliar approach. Not so. Word does not use externally embedded data by default. If you create a table, formula or a graphic, it is inlined by default. Typically you only externally link to a file if it already exists (e.g. it has some meaning independent of this document). I think Microsoft made the right choice there. > I am a big fan of pragmatism on the implementation side, as people > might have noticed from my postings on the design of AElfred; on the > standards side, though, I wouldn't want to cripple a spec just to work > around a temporary problem that will have to be solved anyway for > non-XML objects. SGML is 12 years old. We are only marginally closer to having decent tools that will manage this stuff for us. I personally have no faith that they will arrive soon. I also think that we have 10 years of good experience with what we need to guide our choices. Most major DTDs incorporate ad hoc DTD modularity features. We know what they need to make these features robust -- just namespace protection. > SGML people will remember unfortunate features like > SHORTREF, DATATAG, and OMITTAG -- included a little over a decade ago, > likewise, for the sake of making things easy and working around > temporary deficiencies in the available tools. Well, I still use two of those three features, so obviously the problems with the tools have not sufficiently cleared up yet. It also isn't clear to me if those features have helped or hurt SGML's propularity. OMITTAG in particular is very widely used. Even HTML uses it. > > * element type constrainability (how do I specify a SUBDOC root element > > type in a content model?) > > Use HyTime (just joking). Seriously, I cannot see that this is a > worse case than not being able to use a DTD at all. It isn't. But in XML we do have DTDs and we want to use them for these heterogenous (not "compound") document. > The general idea > of compound documents (Netscape with plug-ins, OLE documents, Andrew > documents, or otherwise) is that you can plug in any object -- I had > imagined that this was the goal of namespaces as well. I don't think so. In my paper I quoted from the XML Namespaces spec: "We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas. Another is the advantage of allowing search engines or other tools to operate over a range of documents that vary in many respects but use common names for common element types. " The goal of combining schemas is central to the concept. > In XML you can > constrain the placement of pointers to external objects, at least. Cold comfort. :) > > * "content model communication" (how do I pass a %cell; content model > > into my table subdoc) > > You're thinking of CALS here. I'd suggest that we move away from the > older SGML model of heavily parameterised DTDs (as from heavily > #IFDEF'ed C header files): remember that one of the arguments for the > namespace model is to reuse stylesheets and other processing > specifications -- if a table model can vary its content unpredictably, > then you will not be able to reuse stylesheets anyway. The formatting for the contents of table cells and for the shape of the table can be specified independently. In HTML, (for example) essentially anything can go in a table cell. The table formatter just figures it out. A good stylesheet language will provide quite a bit of independence between construction rules. Yes, we may need some conventions for more complex combinations (e.g. metadata formatting conventions), but most things will "just work." > > * ID linkage (even for simple links I must use some more advanced > > linking strategy) > > HREFs would work fine -- HTML people are already used to > > > > so we should have no confusion here. > > * semantics (i.e. SUBDOC has none...you need VALUEREF or something else > > on top of subdoc) > > I expect that XLL will provide mechanisms for expressing the 'embed' > semantic. Both of these proposals just add hassles to something that should be simple. > Furthermore, you have the > advantage that your document's validity does not depend on its child > objects (this is very important for document management in large, > multi-author systems -- if subdocuments are atomic, then a change by > one author to a table, for example, will not make the containing > chapter invalid). Again, as in programming, encapsulation will be a > big win in the medium term. Yes, there are occasions where this encapsulation is important and useful. There are also times where it is not. Let me put it this way: do you feel that the creators of DocBook, TEI and HTML were mistaken by including table models rather than forcing their users to use subdocs? If yes, then you have a very different idea of usable DTD design than I do. If no, then I cannot understand why you are opposed to making this process of including table models easier so that you do not need people with brains the size of planets and a serious commitment to DTD use to accomplish it. All I am asking is to make this common DTD fragment combination idiom simpler, more standard and more robust so that casual (and expert!) users can whip up their own DTDs by combining fragments instead of manually merging fragments, disambiguating names, adding architectural forms etc. etc. Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Feb 5 00:33:35 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents References: <34D8AE13.610ABC07@technologist.com> <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> <3.0.5.32.19980204191534.009a4bc0@village.doctools.com> Message-ID: <34D90941.4777966F@technologist.com> Eve L. Maler wrote: > > Table models, even if they're not CALS, are going to vary their content > unpredictably, because cells typically need to contain markup *inside* them > that is specific to the information domain *outside* the table structure; > they're surrounded coming and going. There are many other situations where we have the same problem, but just don't recognize it. Think about lists, bibliographies, cross references and so forth. We shouldn't have to reinvent these for each DTD. There are probably a short list of interesting parameterizations on them (for most apps) and we should just include and use them (after specifying the relevant parameterization options). Nobody has tried this (much) in the past because module usage in SGML is just too painful. So only CALS tables and a few other constructs are complex enough that the pain involved in reinventing them outweighs the pain involved in using them from a module. But if we massively reduce the pain in reusing element declarations, we will probably see people reusing them a lot more. That means that we need a convenient parameterization syntax and namespace managment. Actual DTD fragment management would also be very useful. Perhaps the Web can start to serve that role (for those that can't afford full databases). Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 5 02:32:56 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <34D8FD5C.7445D1AE@technologist.com> References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> <199802042253.RAA00485@unready.microstar.com> <34D8FD5C.7445D1AE@technologist.com> Message-ID: <199802050233.VAA00341@unready.microstar.com> Paul Prescod writes: > Not so. Word does not use externally embedded data by default. If > you create a table, formula or a graphic, it is inlined by default. > Typically you only externally link to a file if it already exists > (e.g. it has some meaning independent of this document). I think > Microsoft made the right choice there. Here, perhaps, there is some miscommunication between us. As I understand it (and I am by no means a Microsoft guru, or even a regular user, so please read this with appropriate caution), all Word documents are actually OLE compound objects -- in other words, they consist of (possibly many) separate objects stored in the same physical disk file; a simpler example of the same thing is Java's JAR files. For XML to work on the desktop rather than just on the server, it will also need some kind of packaging standard -- a way for all of the entities (XML and non-XML) that make up a document to be edited, stored, and shipped together, but easily broken apart again when necessary. I'm suggesting that once such a standard exists, and once there are tools to use it, including subdocuments in XML will be as easy as (and hopefully, much less buggy than) including Excel spreadsheets in Word documents. > Let me put it this way: do you feel that the creators of DocBook, > TEI and HTML were mistaken by including table models rather than > forcing their users to use subdocs? Of course not. Different DTDs will include different levels of base markup, depending on their areas of application -- we're dealing only with the case when people want to use structures not defined in the DTD itself. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Feb 5 03:49:30 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net> <34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com> <34D8AE13.610ABC07@technologist.com> <199802042253.RAA00485@unready.microstar.com> <34D8FD5C.7445D1AE@technologist.com> <199802050233.VAA00341@unready.microstar.com> Message-ID: <34D934FA.742DD860@technologist.com> David Megginson wrote: > > For XML to work on the desktop rather than just on the server, it will > also need some kind of packaging standard -- a way for all of the > entities (XML and non-XML) that make up a document to be edited, > stored, and shipped together, but easily broken apart again when > necessary. I'm suggesting that once such a standard exists, and once > there are tools to use it, including subdocuments in XML will be as > easy as (and hopefully, much less buggy than) including Excel > spreadsheets in Word documents. It is only easy to do this with Word because Word manages it for you. I don't intend to change to a dedicated XML editor, do you? > > Let me put it this way: do you feel that the creators of DocBook, > > TEI and HTML were mistaken by including table models rather than > > forcing their users to use subdocs? > > Of course not. Different DTDs will include different levels of base > markup, depending on their areas of application -- we're dealing only > with the case when people want to use structures not defined in the > DTD itself. No, the question is *how do we construct DTDs*? Let me try that quote again: "We envision applications of XML in which a document instance may contain markup defined in multiple schemas. These schemas may have been authored independently. One motivation for this is that writing good schemas is hard, so it is beneficial to reuse parts from existing, well-designed schemas. Another is the advantage of allowing search engines or other tools to operate over a range of documents that vary in many respects but use common names for common element types. " Let me emphasize: "writing schemas is hard, so it is beneficial to reuse parts from existing schemas." The goal is thus to construct DTDs from smaller ones. (e.g. HTML + CALS + MATHML or TEILITE + JAVA + XLL or ...) Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Thu Feb 5 09:34:37 1998 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 17:00:05 2004 Subject: FORTRAN namelist input - remember? Replace with XML! In-Reply-To: "Glenn R. Kronschnabl"'s message of Wed, 04 Feb 1998 09:56:58 -0600 References: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu> Message-ID: Our XML tool suite provides an API for this for XML directly, without using SP. Our NSL tool suite does the same for full SGML, using SP. http://www.ltg.ed.ac.uk/software/xml/ and .../nsl/ ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From serres-doug at usa.net Thu Feb 5 11:56:51 1998 From: serres-doug at usa.net (Doug Serres) Date: Mon Jun 7 17:00:05 2004 Subject: recommendations on currently available streaming XML toolkits? References: <3.0.32.19980204104017.00aad5dc@pop.intergate.bc.ca> Message-ID: <34D9A901.A472FCD3@usa.net> Tim Bray wrote: > At 11:02 AM 03/02/98 -0500, Navdip Bhachech wrote: > >there have been a few discussions on streaming issues in this list > >lately, so I thought I'd ask: > >What are the recommended toolkits (currently available) that allow > >streaming XML, instead of a file based approach? > > Lark (http://www.textuality.com/Lark/) is happy to read a stream. > But as others have pointed out, relative URLs can be a real > problem. -Tim > I'm using MSXML (http://www.microsoft.com/xml/) for streaming too. -- Doug Serres Junior Developer - R&D Andyne Computing Ltd. e-mail: dserres@andyne.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Thu Feb 5 13:47:06 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:05 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: David Megginson wrote: > [snip] > XML documents may (and perhaps, usually will) contain non-XML objects > such as wordprocessor documents, spreadsheets, MPEG clips, Java > applets, audio sequences, and many others -- to date, thankfully, no > one has proposed uuencoding any these and dumping them inline between > a start and and tag. > [snip] Am I to understand from this paragraph that there would be something wrong with uuencoded or base64'd resources, like audio clips or even a Java class, between a start and end tag? I thought this would be a given. Sure using XLL or simple url hrefs are great, but many times the requirement is for a single file with all resources literally included. This is similar conceptually to the intent of MIME, and MHTML, and OLE (at one time the E meant something -- embedding). Syntactically MIME derived methods aren't nearly as nice as stuffing the resource between a start and end tag. Take a look at the Internet Open Trading Protocol http://www.otp.org:8080/ It does this all over the place. A packaging standard to encapsulate all of the resources in the same file is nice, but why isn't legitimate to place them all inline? Gavin. ======================================================== Gavin F. McKenzie Vox:+1(613)230-3676 ext 5277 JetForm Corporation Fax:+1(613)594-8886 http://www.jetform.com mailto:gmckenzi@jetform.com ======================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Feb 5 13:50:11 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, etc. In-Reply-To: <34D82C2E.6C6B3AE7@technologist.com> Message-ID: <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk> At 03:51 04/02/98 -0500, [many people] wrote [about namespaces, architectures, etc.]: I don't want to stifle discussion on XML-DEV, but suggest some guidelines: 1. There is a public draft of the Namespaces paper now, I believe. [Could someone please confirm this and give the location - I wouldn't like to refer to a private document]. My understanding is that the W3C is actively working on namespaces. For this reason I think it is appropriate that proposals for other ways of developing namespaces (especially those which require new syntax or semantics) be referred to the appropriate W3C body. If you aren't a member, but have something to propose I would hope that chairs will be sympathetic if you mail them. A major problem with discussing current W3C activity on this list is that most members/readers do not have up-to-date knowledge of the current W3C discussions. This can make for confusion, and it would break confidentiality for a W3C member to say "hang on, we are going down a different line". The most reasonable thing to do is to discuss the last public draft of a spec (especially its implementation or experience of implementation :-) but NOT, IMO, to make suggestions for its revision. 2. I suggest that discussion is limited to *implementing* or *exploring* the Namespace proposal. The XML spec refers (I think) to "namespace experiments" and I think that this is the approach we should take - i.e. discuss experiments with *this* namespace proposal. My own approach has been: - to create a private namespace experiment - to approach WG members to see if it broke confidentiality - to wait until the spec was public - to distribute it, and a short explanatory note, with the current JUMBO release. (9801a1) So, rather than discuss my very simple namespace experiment on this list (since it has many demerits and will almost certainly be broken by future namespace developments) you can get it and read it with the distribution. Its sole merits are that it is actually implemented, works and does something useful for my applications. If others see it as a way forward I'd be interested. I hope to release JUMBO-PLAY shortly and this will optionally use the namespace proposal. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Feb 5 13:56:20 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:05 2004 Subject: FORTRAN namelist input - remember? Replace with XML! In-Reply-To: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu> Message-ID: <3.0.1.16.19980205095841.2e471f34@pop3.demon.co.uk> At 09:56 04/02/98 -0600, Glenn R. Kronschnabl wrote: >I want to use XML as a general input mechanism for scientific programs. In Great idea! XML revolutionises program input and output. FORTRAN programmers spend half their life with: Column 61 (I2) the number of optional cards describing the FOO. This is an optional branch of a tree. With TEI processing it's marvellous. I am trying to convert the molecular community to use XML as standard for input and output to *existing* programs. If you can achieve it in your community - great. >the old days, say in FORTRAN, one used to use namelist input. In C/C++, one >usually wrote a custom driver. I want to use XML because it appears to make >sense. I have started using SP - and want to build a tree that I can query >(kind of like an xrdb interface) for my input parameters. But, before I >embark on this, I was wondering if 1) this makes sense, 2) someone surely has >a simple tree builder/query interface to SP already that I can use so I don't >have to write my own (none jumped out at me when I looked around). I imagine the simplest way to do this is to write an XML2F77input processor. This is really a stylesheet application. If you wait for XSL I suspect it will solve many of your problems. If you can't wait, then there may be facilities in JUMBO that could be useful. P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Feb 5 14:07:30 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:05 2004 Subject: LISTRIVIA: (was Re: Namespaces, modules and architectures paper available) In-Reply-To: <34ee5fa9.103270755@mail.alink.net> References: <34D82C2E.6C6B3AE7@technologist.com> <34D82C2E.6C6B3AE7@technologist.com> Message-ID: <3.0.1.16.19980205134243.2e470aec@pop3.demon.co.uk> At 12:46 04/02/98 GMT, Charles F. Goldfarb wrote: >As several postings have referred to module proposals that are being considered >for the SGML revision, I thought it might be helpful to post one here. >-- >Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553 > 13075 Paramount Court * Saratoga CA 95070 * USA > International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime > Prentice-Hall Series Editor * CFG Series on Open Information Management >-- > >Attachment Converted: "c:\eudora\attach\module.htm" Charles, We try to dissuade people from attachments to XML-DEV postings because: - some people cannot read them - they do not appear in the hypermailed version - there is no permanent record. - long attachments cost people (including me) money - they cannot be quoted easily Could you please repost. If it's short, please include it; if not please give a URL. TIA P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 5 14:09:49 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:05 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: References: Message-ID: <199802051409.JAA00365@unready.microstar.com> Gavin McKenzie writes: > > David Megginson wrote: > > [snip] > > XML documents may (and perhaps, usually will) contain non-XML objects > > such as wordprocessor documents, spreadsheets, MPEG clips, Java > > applets, audio sequences, and many others -- to date, thankfully, no > > one has proposed uuencoding any these and dumping them inline between > > a start and and tag. > > [snip] > > Am I to understand from this paragraph that there would be > something wrong with uuencoded or base64'd resources, like audio > clips or even a Java class, between a start and end tag? You are quite right that this is legal XML or SGML -- that's one valid use of NOTATION attributes. Here's this paragraph UUENCODED: begin 644 para M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU It reflects well on XML that this is possible. > I thought this would be a given. Sure using XLL or simple url > hrefs are great, but many times the requirement is for a single > file with all resources literally included. I don't see that there is any long-term advantage to that -- in the short-term, it will work around some temporary short-comings in specs and implementations, but it's the equivalent of writing an entire C program in a single file to save time on linking (or even all in main(), to avoid the overhead of subroutines). Modularity and encapsulation have already proven their worth in the programming world, and they will prove their worth in XML as well. In other words, inlining uuencoded objects is a kludge: by all means, do it in your implementations if you plan to ship soon and need to work with the current generation of software and Internet protocols, but recognise that you are creating maintenance headaches for yourself later on (as I have for myself by forcing AElfred into a single Java class file), and **PLEASE** do not codify kludges in standards. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 5 14:14:31 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, etc. In-Reply-To: <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk> References: <34D82C2E.6C6B3AE7@technologist.com> <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk> Message-ID: <199802051413.JAA00386@unready.microstar.com> Peter Murray-Rust writes: > 2. I suggest that discussion is limited to *implementing* or *exploring* > the Namespace proposal. The XML spec refers (I think) to "namespace > experiments" and I think that this is the approach we should take - i.e. > discuss experiments with *this* namespace proposal. I think to this point we have met the first part of this guideline at least -- the discussion has focussed very closely on implementation issues, and as implementors we have been discussing general approaches broadly (i.e. namespaces, architectural forms, and subdocuments) rather than dealing with details of a specific proposal. In fact, architectural forms and subdocuments do not require any proposal at all -- the already exist, and can be used with the current XML spec. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Thu Feb 5 14:54:41 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:05 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: David Megginson wrote: > Gavin McKenzie wrote: > > > > I thought this would be a given. Sure using XLL or simple url > > hrefs are great, but many times the requirement is for a single > > file with all resources literally included. > > I don't see that there is any long-term advantage to that -- in the > short-term, it will work around some temporary short-comings in specs > and implementations...[snip]... > > In other words, inlining uuencoded objects is a kludge: > ...[snip]... > ...recognise that you are creating maintenance headaches for yourself > later on (as I have for myself by forcing AElfred into a single Java > class file), and **PLEASE** do not codify kludges in standards. This is *NOT* a kludge. Take archiving applications for instance. Ideally you want a single file that literally includes all of the resources that were part of the original document. No external linkages. If you go the MHTML route, which is really just extended MIME, it does a pretty good theoretical job of this. The resources are all contained in one file, and the interlinks between the resources are fixed up so that they can refer to each other. Any interlinks that aren't resolved inside the file can redirect out to the net. In fact these interlinks can't really be resolved by the MIME processor, because it is possible that a linkage may occur inside a script embedded in a resource that the MIME processor knows nothing about. If everything were in-situ XML, then it is already one file, easier to archive, and I can come up with conventions for interlinks easily. So....methinks this is not a kludge, rather a necessary, legitimate, and sometimes desirable thing to do. Gavin. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Thu Feb 5 17:47:03 1998 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:05 2004 Subject: Namespaces, etc. In-Reply-To: <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk> (message from Peter Murray-Rust on Thu, 05 Feb 1998 10:28:38) Message-ID: <199802051744.JAA19703@boethius.eng.sun.com> (I am replying separately to both lists because the message was cross-posted. PLEASE DO NOT CROSS-POST BETWEEN THE W3C-XML-SIG LIST AND THE XML-DEV LIST.) [Peter Murray-Rust:] | There is a public draft of the Namespaces paper now, I believe. [Could | someone please confirm this and give the location - I wouldn't like to | refer to a private document]. The Note on name spaces has been on the W3C site for several days, but for some reason wasn't visible from the TR page. That's been fixed now, and you can get the Note at http://www.w3.org/TR/1998/NOTE-xml-names The document is public. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Thu Feb 5 21:24:15 1998 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents References: <199802051409.JAA00365@unready.microstar.com> Message-ID: <34DA2DDA.5C725B41@allette.com.au> David Megginson wrote: > You are quite right that this is legal XML or SGML -- that's one valid use of > NOTATION attributes. Here's this paragraph UUENCODED: > > > begin 644 para > M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R > M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU J ` > end > The real problem with included fragments (as I see it) is the fact that you need to understand the impact of the embedded fragment on structure. An SGML parser would try to fire the elements , and and expand the entity &AA in the above. Even if the element were declared as CDATA, the sequence " element. > In other words, inlining uuencoded objects is a kludge... Unless you plan to write an application to confirm that your embedded fragments aren't detrimental to your structure, I would advise against this. Even if the fragment wasn't detrimental to your structure, it may be to someone who wants to reuse a chunk of your data, adding a dangerous level of uncertainty to your documents. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Feb 5 21:38:55 1998 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: <3.0.32.19980205133605.00a65ae4@pop.intergate.bc.ca> David Megginson wrote: > > begin 644 para > M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R > M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU J ` > end > Don't want to be pedantic, but for this to work you need at least I'm sure you can see why. But in the general case not even that works, because uuencode will be sure to emit the occasional "]]>". Neither SGML nor XML really have any facilities designed to support in-line inclusion of foreign objects. Yes, this is irritating. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Thu Feb 5 21:58:14 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, ArchitecturalForms, and Sub-Documents Message-ID: Isn't Base64 the fix to any fears associated with uuencoded resources emitting an occasional ]]>? Gavin. >-----Original Message----- >From: Tim Bray [SMTP:tbray@textuality.com] >Sent: Thursday, February 05, 1998 4:39 PM >To: xml-dev@ic.ac.uk >Subject: Re: Foreign object inclusion WAS: Namespaces, ArchitecturalForms, >and Sub-Documents > >David Megginson wrote: > >> >> begin 644 para >> M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R >> M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU> J> ` >> end >> > >Don't want to be pedantic, but for this to work you need at least > > > >I'm sure you can see why. But in the general case not even >that works, because uuencode will be sure to emit the occasional >"]]>". Neither SGML nor XML really have any facilities designed >to support in-line inclusion of foreign objects. Yes, this >is irritating. -Tim > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > << File: para >> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From rdaniel at lanl.gov Thu Feb 5 22:18:52 1998 From: rdaniel at lanl.gov (Ron Daniel Jr.) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, ArchitecturalForms, and Sub-Documents Message-ID: <3.0.32.19980205151412.009eb770@cic-mail.lanl.gov> At 04:53 PM 2/5/98 -0500, Gavin McKenzie wrote: > >Isn't Base64 the fix to any fears associated with uuencoded resources >emitting an occasional ]]>? I think so, the allowed characters in Base-64 are A-Za-z0-9+/=. There is a caveat, some base-64 encoders may assume they are only used in MIME contexts, and thus "help" the programmer by converting any text into MIME's "canonical form" (e.g. line ends are converted to CRLF). So, be careful about that whitespace! Ron Daniel Jr. voice:+1 505 665 0597 Advanced Computing Lab fax:+1 505 665 4939 MS B287 email:rdaniel@lanl.gov Los Alamos National Lab http://www.acl.lanl.gov/~rdaniel Los Alamos, NM, USA, 87545 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 6 09:23:04 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:06 2004 Subject: XML and the launch of Chemical Markup Language In-Reply-To: <3.0.1.16.19980128150718.36a792b8@pop3.demon.co.uk> Message-ID: <3.0.1.16.19980206092126.1eafbb32@pop3.demon.co.uk> At 15:07 28/01/98, Peter Murray-Rust wrote: >I have been invited to give a virtual lecture by VEI Ltd and Chemweb Ltd >and I have taken the opportunity to "launch" Chemical Markup Language and >also to promote the use of XML. Details are at: > >http://chemweb.vei.co.uk The transcript of this lecture is - or will be - publicly available at this address. Anyone registered is welcome to contribute to the discussion. I'd welcome any corrections [I have deliberately simplified XML in places]. [There were two server-side breaks in transmission but I hope that anyone who 'attended' was able to get all the material. The 26 slides are also available at: http://www.vsms.nottingham.ac.uk/vsms/talks/chemwebvei/001.html which is the TOC. [The slides are deliberately not interlinked because of the technology.] If you'd like to use material from these please let me know. In passing I prepared the slides using conventional HTML editing tools (Netscape). I kept thinking how it would have been preferable to use XML for this and I think I was close to the break-even point for tooling up and doing it in Java/XML. This would have solved renumbering problems, allowed redesigned layouts to be transmitted to every slide, etc. I would have still output the actual slides in HTML. I am a believer in using HTML for presentations (since I feel it's more flexible/portable/re-usable than other approaches). If other people feel the same way, perhaps we could create a collaborative approach to XML/HTML-slide generation? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 6 10:18:17 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <34DA2DDA.5C725B41@allette.com.au> References: <199802051409.JAA00365@unready.microstar.com> Message-ID: <3.0.1.16.19980206084527.11573c56@pop3.demon.co.uk> I am still unclear how to tackle this (very real) problem. I have sympathy for people who wish to bundle everything into one document because I am not yet happy that we have a completely robust system for bundling together all components of a hyperdocument. [For example, how often do you "save HTML" and find the GIFs are not included?]. When I first started trying to learn SGML I developed a system (costwish) which UUENCODED gifs and other binaries into a single. Since I have no experience of SGML in practice I don't know whether that is the normal thing to do. When I came across something like the following: At 08:23 06/02/98 +1100, Marcus Carr wrote: >David Megginson wrote: > >> You are quite right that this is legal XML or SGML -- that's one valid use of >> NOTATION attributes. Here's this paragraph UUENCODED: >> >> >> begin 644 para >> M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R >> M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU> J> ` >> end >> I converted all the & to & and the < to < I'm not clear why this isn't a useful method since the processor is required to convert them on reading. I have a problem to know what to do with "save XML" on JUMBO. In the SAXDemo routine characters(), DavidM converts non printing chars to escaped variants *e.g. asc(10) -> , but does *not* convert & to & This means that any XML file that contains & will produce invalid XML output. What is the appropriate strategy? Should a "save XML" application convert all five chars (&, <, >, ', ") to their escaped equivalents? Or none? Or just the first two. [In my own community I don't think using ' ]. > P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 6 10:38:51 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:06 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <199802041733.MAA02120@unready.microstar.com> References: <199802041632.LAA14809@geode.ora.com> <199802041506.KAA00956@unready.microstar.com> <199802041632.LAA14809@geode.ora.com> Message-ID: <3.0.1.16.19980206082831.1157496c@pop3.demon.co.uk> At 12:33 04/02/98 -0500, David Megginson wrote: > >Not at all -- you just need a single element type to hold references >to other XML documents. You could even (though this is disgusting) >use > > > I hope that the "disgusting" refers to the use of 'img' and 'src' and the implied semantics rather than the mechanism :-). I am an advocate of the *mechanism* (e.g http://www.vsms.nottingham.ac.uk/vsms/talks/chemwebvei/020.html) where I use XML-LINK explicitly to combine chemistry, maths and text. This has the advantage that it avoids namespace problems. It also allows me to process foreign files if certain assumptions are made. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Feb 6 11:33:41 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: <002501bd32f2$507a2ae0$2ee044c6@donpark> As far as I can see there are two problems: 1. Embedding Binary Data inside XML document This problem is solved with BASE64. I wish we can specify it in the DTD but its workable now. 2. Binding XML and its related files into a single package MHTML works pretty well in 'document' oriented problems and there is no reason why we can not adopt it. Lets call it MXML and just go with it. In non-document oriented problems, MHTML does not work too well because data is laid out sequentially rather than multiplexed to reduce latency. I guess it will be a while befoer WebTV uses XML in a major way... Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 6 11:52:51 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:06 2004 Subject: LISTRIVIA: an apology In-Reply-To: <199802051744.JAA19700@boethius.eng.sun.com> References: <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk> Message-ID: <3.0.1.16.19980206114841.2897d378@pop3.demon.co.uk> At 09:44 05/02/98 -0800, Jon Bosak wrote: >(I am replying separately to both lists because the message was >cross-posted. PLEASE DO NOT CROSS-POST BETWEEN THE W3C-XML-SIG LIST >AND THE XML-DEV LIST.) This was my fault - through sloppy replying to a multiple posting. Since I have stressed the importance of list behaviour I feel ashamed at having slipped from my own guidelines. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Fri Feb 6 12:58:49 1998 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: <199802061308.AAA20028@jawa.chilli.net.au> > From: Don Park > As far as I can see there are two problems: > > 1. Embedding Binary Data inside XML document > > This problem is solved with BASE64. I wish we can specify it in the DTD but > its workable now. You can. For example ... ... An element can have one NOTATION attribute, which specifies how to interpret the element's data. Often this is used to restrict possible notations to lists of types, for example Developers of generic XML tools should make sure that their systems provide ways to interpret NOTATION attributes appropriately: it is a mechnism like MIME media-types, but may be on a finer grain. It is not a mechanism for multi-part documents (unless the DTD is a DTD for representing multipart documents of course) because the notation processor (which the SYSTEM identifier on the NOTATION declaration would identify) runs after the XML processor. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 6 13:40:19 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:06 2004 Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents Message-ID: <3.0.1.16.19980206132732.312792ae@pop3.demon.co.uk> Posted on behalf of Ross Moore >Return-Path: >X-Sender: ross@zeus.mpce.mq.edu.au >Date: Fri, 6 Feb 1998 21:43:43 +1100 >To: peter@ursus.demon.co.uk >From: Ross Moore >Subject: Re: Foreign object inclusion WAS: Namespaces, Architectural > Forms, and Sub-Documents > [... request for posting ...] > > >After receiving Tim's last posting I engaged in an email conversation >with myself, attached here... > >At 10:57 AM +1100 2/6/98, Ross Moore wrote: >>Hello Tim >> >>Is there any reason why a mailer produced 8 copies of the message + attachment >>from you (appended below) ? >>It was Eudora Pro for Macintosh PPC (which automatically decoded OK). >>The Unix mail-server only received 1 copy. >> >>Could it be that the contents has triggered a side-effect, >>detrimental to an external structure ... >> >>Marcus Carr wrote: >>>> In other words, inlining uuencoded objects is a kludge... >>> >>>Unless you plan to write an application to confirm that your embedded >>>fragments >>>aren't detrimental to your structure, I would advise against this. Even >>>if the >>>fragment wasn't detrimental to your structure, it may be to someone who >>>wants to >>>reuse a chunk of your data, adding a dangerous level of uncertainty to your >>>documents. >> >> >>If this reply causes a similar repetition, then we'll know that such >>problems indeed can exist. ;-) > >Yes indeed there is such a problem, because the quoted base 64 portion >is being regarded as an attachment needing decoding. >It no longer has the correct checksum, due to the quoting with `> '. > >The automatic POP retreive from the Unix server was failing. >Each 20 mins (or so) it tries again and also fails. >The 8 copies simply counts how many times it tried before I could >address the problem manually. > > >> `b`e`g`i`n 644 para >> M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R >> M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU> J> ` >> end > >(Here I've doctored the `begin' into `b`e`g`i`n to prevent this >happening again.) > > >[added later] >Doubled `>'s, as in Peter's last mail, do not cause this effect: > >>> >>> begin 644 para >>> M66]U(&%R92!Q=6ET92!R:6=H="!T:&%T('1H:7,@:7,@;&5G86P@6$U,(&]R >>> M(%-'34P@+2T@=&AA="=S(&]N92!V86QI9`IU>> J>> ` >>> end >>> > > > >Is there a lesson here ? > > >Regards, > > Ross Moore > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >Ross Moore email: ross@mpce.mq.edu.au >Mathematics Department phone: +612 9850 8955 >Macquarie University fax: +612 9850 8114 >Sydney, NSW 2109 Internet: >Australia http://www-math.mpce.mq.edu.au/~ross/ > > *************************** > >for the best in (La)TeX-nical typesetting and Web page production >join the TeX Users Group (TUG) --- browse at http://www.tug.org > > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From PaquinM at novasys.qc.ca Fri Feb 6 14:34:38 1998 From: PaquinM at novasys.qc.ca (Paquin, Martin) Date: Mon Jun 7 17:00:06 2004 Subject: XML and the launch of Chemical Markup Language Message-ID: <7183BFDBEB50D111863C080009B453804C73@nemesis.novasys.qc.ca> -----Original Message----- From: Peter Murray-Rust [SMTP:peter@ursus.demon.co.uk] Sent: Friday, February 06, 1998 4:21 AM To: xml-dev@ic.ac.uk Subject: Re: XML and the launch of Chemical Markup Language >I am a believer in using HTML for >presentations (since I feel it's more flexible/portable/re-usable than >other approaches). If other people feel the same way, perhaps we could >create a collaborative approach to XML/HTML-slide generation? Microsoft annouced his intention to have a XML export format for all his office applications, including I supposed PowerPoint. Certainly a place to look. The major part that is missing for conversing a graphic presentation to XML is the possibility to create graphics primitives in HTML. Other than that with dynamic html is possible to have presentation as good than with conventionnnel prsentation package. _____________________________________________________________ Martin Paquin Novasys, Inc. Consultant bureau 2624 Tour de la Bourse, 800, Place Victoria Tel.:(514)875-7720 Case postale 151 Fax.:(514)874-9830 Montr?al (Qu?bec) paquinm@novasys.qc.ca CANADA H4Z 1C3 http://www.novasys.qc.ca xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Feb 6 14:57:05 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:06 2004 Subject: "Save as XML" In-Reply-To: <3.0.1.16.19980206084527.11573c56@pop3.demon.co.uk> References: <199802051409.JAA00365@unready.microstar.com> <34DA2DDA.5C725B41@allette.com.au> <3.0.1.16.19980206084527.11573c56@pop3.demon.co.uk> Message-ID: <199802061453.JAA00386@unready.microstar.com> Peter Murray-Rust writes: > I have a problem to know what to do with "save XML" on JUMBO. In > the SAXDemo routine characters(), DavidM converts non printing > chars to escaped variants *e.g. asc(10) -> , but does *not* > convert & to & This means that any XML file that contains & > will produce invalid XML output. Sorry for any confusion there -- I had originally used '\n' and '\r', then decided to use character references to be more XML-like. I realise, though, that that gives the unintended appearance of an attempt to produce XML-parseable character data. Perhaps I should go back to C-like escapes. > What is the appropriate strategy? Should a "save XML" application > convert all five chars (&, <, >, ', ") to their escaped > equivalents? Or none? Or just the first two. [In my own community I > don't think using have any idea what is going on and they will get it wrong. In any > case - as pointed out - it doesn't overcome the random occurrence > of ']]>' ]. This taps into an earlier discussion about what is an is not significant information in an XML document. For example, if the general entity &name; is set to "David Megginson", then the following two fragments are exactly equivalent for many XML applications: FRAGMENT 1: My name is<-- here's a comment --> &name;. FRAGMENT 2: My name is David Megginson. Some authoring and repository tools, however, will want to preserve the general entity reference, the comment, and the whitespace (even inside the start tag). In SGML, you can use grove plans to specify what information is and is not significant to an application -- but there is still a lack of detailed standards for the information set (or sets) returned by an XML parser. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From trevort at za.ibm.com Fri Feb 6 15:15:52 1998 From: trevort at za.ibm.com (Trevor Turton) Date: Mon Jun 7 17:00:06 2004 Subject: Meta data for XML editors Message-ID: <5060200010923033000002L032*@MHS> Development of XML editors is underway. Since XML is flexible/extensible, XML editors will have to be so too. The XML user will need to specify a list of the DTD schemas that will be used in composing a particular document. The editor will need to fetch and parse these schemas so it can validate the user's input. More usefully, the editor could present the DTDs to the user as a series of stacked palettes, each containing a list of the elements defined within each DTD. When the user selects an element from a palette for inclusion in the target document, the editor could present the tags and attributes associated with the element, and hence guide the user in constructing a syntactically correct document that conforms to the DTDs. Let's up the ante a little. A palette of entities would be more useful if: * Each entity were represented by an icon that suggests its function * Each entity popped up a one-liner outlining its function whenever the mouse hovered over it for a while * Each entity was backed by complete help documentation (in XML, of course) To do this stuff well, the editor would need access to more than just the plain unvarnished DTD. It would need extra meta data to be associated with the DTD, but only used at document composition time. Browsers and other rendering programs would not need to access this extra information when they render the final document, and indeed it would slow them down unnecessarily to do so. It may make sense to exploit XML's (proposed) powerful hyperlink facilities to associate compose-time meta data with DTDs. All of the design-time meta data required to help the user understand and exploit the DTD could be made available in this way. If this meta data is made available through hyperlinks then it may be a good idea to establish a convention now, while it's still early enough, as to how such compose time meta data will be classified, and to encourage the builders of browsers and other rendering engines to omit these designated hyperlinks from the popup menus they present to their users should the user click on the associated hot-spot; or at least to make this omission the default action, overrideable in the browser's option settings. It seems likely to me that a number of different software developers will build XML editors that make use of associated compose-time meta data such as I have described above, and that each will choose to format this information in a different way, and that DTD builders will be faced with the dilemma of which meta data format they should use, and that the value of all DTDs will be diminished by the fact that different XML editors will work best with different formats of meta data. Can we try to pre-empt this problem before it hits us by debating and proposing a standard format for compose-time meta data? Trevor Turton xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Fri Feb 6 15:21:41 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:06 2004 Subject: XML Java IO Writer... Message-ID: <34DB2B73.50C6DD90@infinet.com> A while back I brought up the idea of having an XML InputStream which inherits FilterInputStream and did not get much response. Anyways, it would of been better for this to be an XMLReader since the Reader classes handle all of the nitty gritty character conversion for you in the first place. Well due to the demands of the application I am writing now, it needs to format as well as parse XML data from a variety of streams so for now I am proposing that aside from just SAX, we have an IO package in the org.xml domain which could possibly have parsers assigned to them. I spent about half a day and wrote an XMLWriter class which is an extension of FilterInputStream for preparing XML Documents from Java without having to do a lot of OutputStream.write() calls manually from line to line. Things are packaged right now under org.xml.io, but that is only tentatively as I do not have any real permission from the guy who owns the rights to xml.org (I can't remember who you are so maybe this will get your attention). Right now I am using this in my own application and it works beautifully. This is what I would call a 0.1 version since it does not handle all sorts of things like Notations and lots of other stuff. You can get the zip file with source code included.at http://www.infinet.com/~tyler/xml/xmlio01.zip Here is a brief description of the classes included: package org.xml.io; import java.io.Writer; import java.io.FilterWriter; import java.io.IOException; import java.util.Hashtable; public class XMLWriter extends FilterWriter { public XMLWriter(Writer out, String padding) {} public void writeDocument(Element rootElement, String ID) throws IOException {} public void writeDocument(Element rootElement, String ID, Entity[] entities) throws IOException {} public void writeDocument(Element rootElement, String ID, Entity[] entities, boolean system) throws IOException { private String replaceText(String content) {} } This class takes as another argument, another Writer and a String which is essentially used for padding the nested levels of your document. For example, you could use two spaces as padding or else just a tab. You would create this class by making a call like this: XMLWriter writer = new XMLWriter(new OutputStreamWriter(out), " "); where out is of type OutputStream. To write a document you call writeDocument() which takes three forms. writeDocument(Element rootElement, String ID) is the same as writeDocument(Element rootElement, String ID, null, true) and writeDocument(Element rootElement, String ID, Entity[] entities) is the same as writeDocument(Element rootElement, String ID, Entity[] entities, true); writeDocument(Element rootElement, String ID, Entity[] entities, boolean system) is what is actually called. The element type is the root element you write, ID is the system or public ID of the DTD, entities are an array of type Entity[] which is used to replace in the document, and system is a flag indicating whether ID should be treated as a system ID or a public ID. So in my code I call this (the class calling this is of type element). writer.writeDocument(this, "forumReference.dtd"); package org.xml.io; public interface Element { String getName(); // may return null String getContent(); // may return null Attribute[] getAttributes(); // may return null Element[] getChildren(); // may return null String getComments(); } This interface defines an element type. Usually you implement this for each class which has data that can be mapped to an XML document. If on the other hand you have a class which should not be inherited or is even final, then use inner classes to solve your problem. For example, for java.net.InetAddress I use this in my code to implement the Element[] getChildren method. I use AbstractElement (included on org.xml.io) so I only have to redefine the methods that can return null anyways. public Element[] getChildren() { Element[] children = new Element[4]; children[0] = new AbstractElement() { public String getName() { return "id"; } public String getContent() { return ID; } }; children[1] = new AbstractElement() { public String getName() { return "host"; } public String getContent() { return host.getHostAddress(); } }; children[2] = new AbstractElement() { public String getName() { return "port"; } public String getContent() { return String.valueOf(port); } }; children[3] = new AbstractElement() { public String getName() { return "ior"; } public String getContent() { return IOR; } }; return children; } package org.xml.io; public interface Entity { String getName(); String getValue(); } The Entity class is basically for replacement text of course. What it will do is replace any occurrences of getValue() in the rest of the stream with a '&' prepended and a ';' appended to getName(). when you call writeDocument() every entity passed will be checked in the document. This is an expensive operation so use it wisely. If null is passed as the Entity[] argument to writeDocument, then no checks will occur. The XMLWriter class will recursively descent from the root element to all sub elements and get their content and write it out. This package is something I spent half a day on to basically get the job done for what I needed to do, and I would more than happily GPL it all later on under the xml.org.io package if given permission and there is interest in doing more (fixing bugs and adding complete XML functionality). Right now it all works for what I am doing and chopped about 300 lines of code (what I guess people call report writing) out of my application. The less code in my app, the better. Which also makes me ask is it better to have a parser which may be large in code size, but is easy to use so my production code is small, or a parser with little functionality that makes my production code large. Of course you can sometimes have the best of both worlds. Tyler Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Feb 6 21:55:48 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:06 2004 Subject: Meta data for XML editors References: <5060200010923033000002L032*@MHS> Message-ID: <34DB4F51.EEC875D1@technologist.com> Trevor Turton wrote: > > Can we try to pre-empt this problem before it hits us by debating and > proposing a standard format for compose-time meta data? Such a standard should build upon RDF or XML-Data, so I would propose that it is best to wait until those are a little more "real." Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sat Feb 7 02:30:56 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <000401bd336f$c33133d0$2ee044c6@donpark> It looks like XML is about to be approved as standard by W3C. Could we please have BASE64 sections as a part of XML standard 1.0? Everyone who support this idea, please reply to this message (short replies please to avoid LISTRIVIA). [to be inserted somewhere between 2.7 and 2.8 of XML spec] Using BASE64 sections I am not sure if conflicts badly with SGML but, if not, it could be immensely helpful for developers. Regarding SAX, we could have a callback for binary data similar to the characters() callback. As far as W3C DOM is concerned, we will need a another Node type (BINARY). Sincerely, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Feb 7 04:14:58 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 References: <000401bd336f$c33133d0$2ee044c6@donpark> Message-ID: <34DBDF21.AE382FB4@jclark.com> Don Park wrote: > > It looks like XML is about to be approved as standard by W3C. Could we > please have BASE64 sections as a part of XML standard 1.0? There is no chance of such a change being made at this stage. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From thillai at ix.netcom.com Sat Feb 7 04:28:49 1998 From: thillai at ix.netcom.com (Thillai) Date: Mon Jun 7 17:00:06 2004 Subject: DOM for XML Message-ID: <01BD3355.6B9B0480@nbw-nj9-40.ix.netcom.com> Is there any DOM implementation for XML? Thillai xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sat Feb 7 08:16:43 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <000901bd33a0$10e44060$2ee044c6@donpark> James, >> It looks like XML is about to be approved as standard by W3C. Could we >> please have BASE64 sections as a part of XML standard 1.0? > >There is no chance of such a change being made at this stage. Could you please elaborate on why there is no chance of BASE64 section proposal being accepted? As far as I know, XML has not been approved. Whether or not W3C has already written out the approval announcement, the fact is that it is not approved yet. If your assessment of the probability is based the lack of time for the XML-WG to consider such a proposal, I must beg to differ with you. As far as I know, the WG serves the community and, while its activity must be constrained by the schedule of its members, the need of the community must be met if the need is worthy enough. I am not sure if support for embedded binary data has been brought up in the WG but I am, frankly, very disappointed with the lack of support. CDATA is awfully inadequate. The Open Trading Protocol (OTP) proposal has a need to embed signature within OTP documents and it uses , OTP states: "Any CDATA end sequences ("]]>") within the data are replaced by "]]]]>" in order to escape the CDATA end sequence" Am I the only one who thinks this is pure madness for a yet-to-be-approved standard that proposes to be the next generation data formating language? If I did not bring up the subject before, it is because I 'trusted' the WG to find a solution to a problem which I assumed was too significant to ignore. Perhaps the problems I am concerned about are not of major concern for the WG members. Is it only me who worries about the problem of handling endless data whose existance is defined only by its movement? Is TV broadcast not a document? Whether or not my proposal is accepted or not, I would like to know if others feel that better support for binary data is needed or not. And I would like to ask the members of the WG to consider the issue and the need some of us have. Delay of one to two week is, IMHO, well worth it. Sincerely, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Feb 7 08:43:37 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 References: <000901bd33a0$10e44060$2ee044c6@donpark> Message-ID: <34DC1DEC.A20A1993@jclark.com> Don Park wrote: > >> It looks like XML is about to be approved as standard by W3C. Could we > >> please have BASE64 sections as a part of XML standard 1.0? > > > >There is no chance of such a change being made at this stage. > > Could you please elaborate on why there is no chance of BASE64 section > proposal being accepted? As far as I know, XML has not been approved. XML 1.0 is already a Proposed Recommendation. The W3C process does not allow major new features to be added between Proposed Recommendation and Recommendation. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Feb 7 12:17:07 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:06 2004 Subject: DOM for XML In-Reply-To: <01BD3355.6B9B0480@nbw-nj9-40.ix.netcom.com> References: <01BD3355.6B9B0480@nbw-nj9-40.ix.netcom.com> Message-ID: <199802071217.HAA00568@unready.microstar.com> Thillai writes: > Is there any DOM implementation for XML? The DOM isn't finished, so any implementation is necessarily tentative. With that warning, however, you can look at http://www.quake.net/~donpark/saxdom.html The nice thing about Don's work is that SAXDOM will run with any SAX-conformant Java XML parser, so you can use NXP, Lark, MSXML, AElfred, and/or XP, as you wish. Don also includes some information about integrating the DOM with the new, standard Java Swing widgets. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Feb 7 15:34:17 1998 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <3.0.32.19980207072801.00ab7864@pop.intergate.bc.ca> At 12:11 AM 07/02/98 -0800, Don Park wrote: >Could you please elaborate on why there is no chance of BASE64 section >proposal being accepted? Simply put, we create standards following a set of formal rules, which is strictly necessary if you are to have any hope of getting Netscape, Micrsoft, Sun, et al, to go into a room and come out with a real result. The rules do not, at this point in time, leave room for the introduction of major new features. Having said that, I think that it would be a good idea for someone to write up a proposal for the use of a reserved attribute or namespace to signal, as a convention in XML 1.0, that the contents of an element are base64 encoded. This could be destined for XML 1.1 or perhaps serve as a standalone recommendation layered on top of XML. Nobody disputes that there is a real need in this area. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sat Feb 7 15:35:49 1998 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 References: <000901bd33a0$10e44060$2ee044c6@donpark> Message-ID: <34DC7FA0.481781EB@technologist.com> Don Park wrote: > > James, > Could you please elaborate on why there is no chance of BASE64 section > proposal being accepted? As far as I know, XML has not been approved. > Whether or not W3C has already written out the approval announcement, the > fact is that it is not approved yet. If your assessment of the probability > is based the lack of time for the XML-WG to consider such a proposal, I must > beg to differ with you. As far as I know, the WG serves the community and, > while its activity must be constrained by the schedule of its members, the > need of the community must be met if the need is worthy enough. The W3C only serves the needs of the community indirectly through serving the needs of its members. The community has no official standing in the process. In this particular case, if we held up XML 1.0 for everything someone considered important, it would never ship. That's what happened to HTML 3.0. I'm not denying that your complaint is important -- XML has many large flaws. That's life in the standards process. Paul Prescod -- http://itrc.uwaterloo.ca/~papresco xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Sat Feb 7 16:45:38 1998 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:06 2004 Subject: Last minute request for BASE64 section support in XML 1.0 In-Reply-To: <000401bd336f$c33133d0$2ee044c6@donpark> Message-ID: <199802071643.IAA21218@boethius.eng.sun.com> [Don Park:] | It looks like XML is about to be approved as standard by W3C. Could | we please have BASE64 sections as a part of XML standard 1.0? | Everyone who support this idea, please reply to this message (short | replies please to avoid LISTRIVIA). Don't bother. Under W3C procedure, XML 1.0 has been substantively frozen since the Proposed Recommendation went out for member balloting on December 8. Substantive changes will have to wait for XML 1.1. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Feb 7 19:33:05 1998 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 17:00:06 2004 Subject: file URLs again Message-ID: <3.0.32.19980207113030.00ab0c90@pop.intergate.bc.ca> Hi, I've been getting a bit behind... did this group in its collective wisdom come up with a snippet of Java that makes a really good and sincere effort to open a URL that looks like "spec.dtd" and works reliably on MS & other OS's, with more than one JVM? I seem to recall seeing one go by, but can't find it. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sat Feb 7 22:10:06 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:06 2004 Subject: file URLs again Message-ID: <000d01bd3414$7c720780$2ee044c6@donpark> Tim, Try this: public URL createFileURL (String fileName) { File file = new File(fileName); try { String path = file.getAbsolutePath(); char sep = File.separatorChar; if (sep != '/') path = path.replace(sep, '/'); if (path.charAt(0) == '/') path = "file://" + path; else path = "file:///" + path; return new URL(path); } catch (MalformedURLException e) { return null; } } I wish File.getCanonicalPath() could have been used instead of getAbsolutePath() but it throws exception if the file does not exist. If that is the behavior you want, replace getAbsolutePath() with getCanonicalPath(). I have used File.separatorChar instead of File.separator or even getProperty("file.separator") because I don't know of any system that has multicharacter separators. It will be a lot more messy if you want to handle that case as well. Hope this helps, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Feb 8 12:38:59 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:06 2004 Subject: file URLs again In-Reply-To: <3.0.32.19980207113030.00ab0c90@pop.intergate.bc.ca> References: <3.0.32.19980207113030.00ab0c90@pop.intergate.bc.ca> Message-ID: <199802072149.QAA00313@unready.microstar.com> Tim Bray writes: > Hi, I've been getting a bit behind... did this group in its > collective wisdom come up with a snippet of Java that makes > a really good and sincere effort to open a URL that looks > like "spec.dtd" and works reliably on MS & other OS's, with > more than one JVM? I seem to recall seeing one go by, but > can't find it. -Tim This one's from the latest SAXDemo.java, incorporating modifications suggested by James Clark: /** * If a URL is relative, make it absolute against the current directory. */ private static String makeAbsoluteURL (String url) throws java.net.MalformedURLException { URL baseURL; String currentDirectory = System.getProperty("user.dir"); String fileSep = System.getProperty("file.separator"); String file = currentDirectory.replace(fileSep.charAt(0), '/') + '/'; if (file.charAt(0) != '/') { file = "/" + file; } baseURL = new URL("file", null, file); return new URL(baseURL, url).toString(); } All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Sun Feb 8 14:44:36 1998 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 17:00:06 2004 Subject: GEDCOM - A Killer XML Application? Message-ID: <199802081444.OAA27250@mail.iol.ie> I have been wandering the Web searching for my Wife's relatives (surname Kilcawley. Know anyone?) and have learned very quickly that there is a *huge* amount of genealogy stuff/activity on the Web. Most of it revolves around a genealogy file format called GEDCOM that apparantly originated with the Church of the Latter Day Saints. This is a snippet of Gedcom: 1 NAME Archibald /BARD_(Beard)/ 1 SEX M 1 BIRT 2 DATE SEENOTES 2 PLAC Antrim,Ireland 1 DEAT 2 DATE FEB 1765 Sure looks like a cool XML application to me! I mean,the whole point of these GEDCOM files is publishing/interchange of genealogy data. Richly structured hierarchies. Oodles of scope to show of spiffy XLL linking, spiffy XSL rendering, intelligent search agents. The whole nine yards. Has anyone looked into this? If not, anyone interested in helping to get a ball rolling? xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From smith at interlog.com Mon Feb 9 05:10:17 1998 From: smith at interlog.com (Chris Smith) Date: Mon Jun 7 17:00:07 2004 Subject: Open Trading Protocol (CDATA, etc)(was BASE64 section support) In-Reply-To: <000901bd33a0$10e44060$2ee044c6@donpark> Message-ID: On Sat, 7 Feb 1998, Don Park wrote: > I am not sure if support for embedded binary data has been brought up in the > WG but I am, frankly, very disappointed with the lack of support. CDATA is > awfully inadequate. The Open Trading Protocol (OTP) proposal has a need to > embed signature within OTP documents and it uses For occasional occurrance of ]]>, OTP states: > > "Any CDATA end sequences ("]]>") within the data are replaced by > "]]]]>" in order to escape > the CDATA end sequence" I think you are lifting this a little out of context. The item you referenced is from the specification on canonicalization. As well, it was one of several design choices going into OTPv0.9, which are likely to be the subject of cooler heads. More to the point, that item refers to *all* data in elements. A more relevant area is what the Open Trading Protocol does NOT handled. The best example here is order description (often known as Invoice). We felt that we could never handle all needs, and we needed to allow for both simple and complex solutions, and both current and future solutions. As a result, the element content is ANY, while we have a ContentFormat attribute that lets you indicate the following: XML, PCDATA, BASE64, HTML, MIME, plus a user-defined option. (There is a remaining discussion topic re splitting into ContentFormat and ContentEncoding, which I hope is actually accomplished.) This, I think, is a reasonable compromise. Although it does not lock down the protocol completely, making implementation more difficult, it allows for XMl/EDI, simple plain text, and HTML browser displayed text or graphic offers (yes - you could essentially have an invoice that contained a picture of the item you are purchasing). (for more details, http://www.otp.org ) In our terms, all a xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Mon Feb 9 06:32:04 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:07 2004 Subject: Encoded XML Content -- was Re: Open Trading Protocol (CDATA, etc)(was BASE64 section support) Message-ID: <001201bd3523$ba543450$2ee044c6@donpark> Chris, >From the responses I have gotten from some of the members of the XML-WG, it is clear that we can't add BASE64 section to the spec. As you pointed out, BASE64 section is not helpful enough for XML applications. Tim suggested that we write up a proposal for the use of a reserved attribute or namespace to signal, as a convention in XML 1.0, that the contents of an element are base64 encoded. Such a proposal would serve the need right now and could be adopted by XML 1.1 in the future. I would like to form a small team to write the proposal. Since we are dealing with a focused subject, I would like to fasttrack this proposal. Let me get the ball rolling with following brief summary of the proposal: 1. Name Names are important since they serve as mental hooks to hang knowledge. The choices I can think of are: a) XML-Binary b) XML-Blob c) Encoded XML Content I would like to use a short easily understandable name like XML-Binary so that vendors can say their product supports XML-Binary. 2. Mechanism I tend to prefer the use of reserved attribute(s) than namespace. I would very much like to see something like xml:space attribute used. For the kind of applications I am familiar with, adding following two special attributes would be enough: xml:encoding="base64" xml:mimetype="image/gif" Should we limit it to base64 and just have xml:encoded attribute with true and default as possible values? Should we be using some standard encoding standard names? Frankly I am not aware of any such standard (duh!). Do we need xml:mimetype? My application sure could use it since I can fireup a content handler based on the mimetype and pass it the decoded data. The content handler returns a component which is inserted into the tree to display the content. This should be enough get the discussion going. Sincerely, Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Mon Feb 9 10:14:52 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:07 2004 Subject: GEDCOM - A Killer XML Application? Message-ID: <01bd3543$743bd0c0$1e09e391@mhklaptop.bra01.icl.co.uk> >I ... have learned very quickly that there >is a *huge* amount of genealogy stuff/activity on the Web. > >Most of it revolves around a genealogy file format called GEDCOM >that apparantly originated with the Church of the Latter Day Saints. > Yes, I've done some work on this, and have been hoping to go public, but it's come to a bit of a standstill while other activities mroe important to my employers have taken over. I agree with you that am XML encoding of GEDCOM (let's call it GedML?) offers great potential benefits: - solving GEDCOM's problems with character sets and binary objects - allowing "rich text" in the textual fields - providing a mechanism for cross-file linkage - making it much easier to write GEDCOM applications - allowing GEDCOM data to be published directly on the web, rather than being reformatted for publication on the web - allowing web search engines to index GEDCOM files intelligently I've got as far as - writing a few notes on the design principles / rationale - writing GEDCOM to GedML converters in both directions - working out in principle how to enhance these to do ANSEL to UNICODE conversion - writing a DTD for GedML - writing an MSXML application that creates a (partial) Java representation of the GEDCOM object model for use by applications. Since I'm stalled, any cooperation will be much appreciated! regards, Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From alex.webb at staempfli.com Mon Feb 9 10:42:11 1998 From: alex.webb at staempfli.com (Webb Alex) Date: Mon Jun 7 17:00:07 2004 Subject: GEDCOM - A Killer XML Application? Message-ID: A very interesting brochure "The Gedcom Standard Release 5.5" is available from http://www.tiac.net/users/pmcbride/gedcom/55gctoc.htm This details the philosophy and current (?) standard. Does anyone have an alternative genealogy dtd ??? Alex Webb Xml-dev: A list for W3C XML Developers. To post, Archived as: To (un)subscribe, the following message; (un)subscribe xml-dev To subscribe to the digests, the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa () xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Mon Feb 9 11:05:21 1998 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 17:00:07 2004 Subject: GEDCOM - A Killer XML Application? Message-ID: <01bd3544$bdb5d2e0$LocalHost@sgml.u-net.com> Sean >I have been wandering the Web searching for my Wife's relatives >(surname Kilcawley. Know anyone?) and have learned very quickly that there >is a *huge* amount of genealogy stuff/activity on the Web. > >Most of it revolves around a genealogy file format called GEDCOM >that apparantly originated with the Church of the Latter Day Saints. > >This is a snippet of Gedcom: > > 1 NAME Archibald /BARD_(Beard)/ > 1 SEX M > 1 BIRT > 2 DATE SEENOTES > 2 PLAC Antrim,Ireland > 1 DEAT > 2 DATE FEB 1765 > >Sure looks like a cool XML application to me! Defining an XML DTD for it is easy, but what is really interesting is how you could use the data already out there in this format within XML applications without having to recode it all. Unfortunately the XML-Data proposal does not seem to provide sufficient tools for mapping the existing schema to an XML equivalent without invoking a specialist script. It would be nice if there were some generalized mechanisms for doing this. > I mean,the whole >point of these GEDCOM files is publishing/interchange of >genealogy data. Richly structured hierarchies. Oodles of >scope to show of spiffy XLL linking, spiffy XSL rendering, >intelligent search agents. The whole nine yards. > >Has anyone looked into this? If not, anyone interested >in helping to get a ball rolling? I am currently exploring how we could do a mapping between three file formats, XML, CSV and GEDCOM, to provide an integrated set of resources for tracing genealogical information through a HyTime-encoded Topic Navigation Map. This goes a bit beyond what you are suggesting, but may be more practical in the longer run. Martin Bryan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Mon Feb 9 12:45:25 1998 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 17:00:07 2004 Subject: XML-Data Questions In-Reply-To: "Don Park"'s message of Fri, 30 Jan 1998 16:11:53 -0800 References: <000001bd2ddc$ec7641b0$2ee044c6@donpark> Message-ID: "Don Park" wrote on 30 Jan (sorry for late reply): > I have some questions about the XML-Data spec which affects implementation: > > 1. How are the schemas referenced from XML documents? Not clear. Given the NON-official status of XML-Data, a PI seems the most likely route for now. > 2. How does one validate XML documents which use XML-Data schema rather than > DTD? One doesn't :-) See previous discussion on this list about validation -- 'valid' is predicate over document instances and doctypes AS SPECIFIED IN THE XML SPECIFICATION. The following extract from my SGML97 paper (cf. http://www.ltg.ed.ac.uk/~ht/B9H.html) is relevant: "In our approach, we envisage a) the schema DTD, a definition of an XML representation of document structure, that is, an old-style DTD for schemata; b) a master XML application, the equivalent of the XML parser, which is capable of processing pairs of XML documents, where the first, a schema, is valid in terms of the schema DTD; the second, an instance, has no old-style DTD, but is both well-formed in the XML sense and meta-valid in terms of the schema expressed by the first. Meta-validity is, of course, [conformance] to the document structure constraints contained in the associated schema, which [itself is valid per] the schema DTD." > 3. Current XML-Data does not allow or rather make it easy for enumerated > attribute values to contain spaces becuase space is used as delimeters. > > Why not use the following structure to define enumerated attribute values? > > > > children > adult > adult > > Um, the Enumeration declared value for attributes must consist of Nmtokens (production 59 in the Proposed Recommendation) so the issue doesn't arise. Support for enumerated notation values isn't in XML-Data yet (if I remember right) but the same constraint obtains there. Hope this helps. ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Mon Feb 9 14:18:56 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:07 2004 Subject: Encoded XML Content -- was Re: Open Trading Protocol (CDATA, etc)(was BASE64 section support) Message-ID: Don, Here's my suggestions... Methinks xml:encoding is too close to the XML PI encoding for character set encodings. I wish that the XML PI encoding had been called text-encoding or char-encoding -- this would have made it easier to come up with other 'encoding' attributes without ambiguity. *sigh* How about: 1. xml:transfer-encoding 2. xml:content-encoding Suggestion #1 is a little ugly because it has the word 'transfer', but this is closer to the MIME heritage where base64 is primarily used for packaging style encoding, as opposed to locale char-set encoding. Suggestion #2 may seem redundant but at least doesn't conflict directly with 'encoding' in the context of locale char-set encoding. As for the mimetype attribute...I'd vote for something closer to IOTP, such as: xml:content-format where content-format can be one of: - a mimetype that indentifies the content format, e.g. "image/jpeg" - a user-defined code of the form "x-ddd:nnn", where ddd is a domain and nnn is an arbitrary name for the format e.g. "x-jetform:mdf" However, IOTP includes other acceptable values for content-format such as 'PCDATA' and 'XML'. I view this as duplication and believe that only the two options above are necessary; i.e. XML content should be able to be expressed as 'text/xml', ignoring the fact that this isn't a *real* mimetype. And I assume that the implication in all of this that somebody could include content that contains well-formed and valid xml that happens to be base64'd? Hence it is neccessary for the parser to unwrap such sections, right? Thoughts? Gavin. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From serres-doug at usa.net Mon Feb 9 15:22:57 1998 From: serres-doug at usa.net (Doug Serres) Date: Mon Jun 7 17:00:07 2004 Subject: GEDCOM - A Killer XML Application? References: <199802081444.OAA27250@mail.iol.ie> Message-ID: <34DF1F10.846364EF@usa.net> Sean Mc Grath wrote: > I have been wandering the Web searching for my Wife's relatives > (surname Kilcawley. Know anyone?) and have learned very quickly that there > is a *huge* amount of genealogy stuff/activity on the Web. > > Most of it revolves around a genealogy file format called GEDCOM > that apparantly originated with the Church of the Latter Day Saints. The Church of Jesus Christ of Latter-day Saints (http://www.lds.org/Family_History/How_Do_I_Begin.html) > Has anyone looked into this? If not, anyone interested > in helping to get a ball rolling? I'd be interested in this one too! --Doug xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From deke at tallent.com Mon Feb 9 15:54:42 1998 From: deke at tallent.com (Deke Smith) Date: Mon Jun 7 17:00:07 2004 Subject: GEDCOM - A Killer XML Application? Message-ID: <1325104436-519141701@tallent.com> Martin Bryan, mtbryan@sgml.u-net.com said on 2/9/98 4:23 AM: >Defining an XML DTD for it is easy, but what is really interesting is how >you could use the data already out there in this format within XML >applications without having to recode it all. > >Unfortunately the XML-Data proposal does not seem to provide sufficient >tools for mapping the existing schema to an XML equivalent without invoking >a specialist script. It would be nice if there were some generalized >mechanisms for doing this. There has existed a de-facto standard for conversion of GEDCOM to HTML for a couple of years. Information about it can be found at: . I have used GED2HTML (http://www.gendex.com/ged2html/) and it works VERY well, even on large databases. The code is in Perl and gets you half-way there on an XML conversion. Deke ----------------------------------------------------------------- Deke Smith Tallent Communications Group, Brentwood TN deke@tallent.com, 615-661-9878 ----------------------------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From deke at tallent.com Mon Feb 9 15:57:11 1998 From: deke at tallent.com (Deke Smith) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <1325104294-519150211@tallent.com> Don Park, donpark@quake.net said on 2/6/98 8:26 PM: >It looks like XML is about to be approved as standard by W3C. Could we >please have BASE64 sections as a part of XML standard 1.0? Everyone who >support this idea, please reply to this message (short replies please to >avoid LISTRIVIA). Supported. If not officially accepted it WILL be used anyhow. Deke ----------------------------------------------------------------- Deke Smith Tallent Communications Group, Brentwood TN deke@tallent.com, 615-661-9878 ----------------------------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Mon Feb 9 17:08:34 1998 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 In-Reply-To: <1325104294-519150211@tallent.com> (message from Deke Smith on Mon, 9 Feb 98 09:56:39 -0600) Message-ID: <199802091712.MAA05614@geode.ora.com> [Deke Smith] > Supported. If not officially accepted it WILL be used anyhow. Then you will be using something other than XML, and good luck getting any application to accept it. The string ' http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From pazandak at OBJS.com Mon Feb 9 18:41:50 1998 From: pazandak at OBJS.com (Paul Pazandak) Date: Mon Jun 7 17:00:07 2004 Subject: Type-specific class generation using XML parsers Message-ID: <34DF4EBB.7B0EAC1B@OBJS.com> I have finished modifications to an XML parser to support type-specific class tree generation, as opposed to generic tree objects. This means, for example, that the parser would generate a complex object of the form: BOOK (using book.java) - CHAPTER (using chap.java) - SECTION (using sect.java) etc. for an xml document describing a book. The resulting tree is useable immediately without further parsing or traversing of the tree, which would be generally required if the tree was composed of generic XML objects. The class specifications are embedded in the accompanying DTD (which are then consumed by the parser), but could as easily be embedded in the xml document itself. My question is what, if any, effort is there to standardize how class-related metadata is defined within a DTD or XML specification? I'd prefer to adopt an approach that is likely to be standardized. In addition, what other approaches (excluding hard-coding classnames) have been proposed to produce the same result as I have described? Regards, Paul. p.s. This all came about because event-based parsing seems like quite a pain. In addition, any changes to the XML structure can require many changes to the event-handling code. Further, the generation of generic tree structures is not very useful because one must traverse the tree and basically parse (again!) the tree to generate application-specific structures. So, why not have the correct structure be generated the first time by the parser? -- ******************************************************************** Paul Pazandak, Ph.D pazandak@objs.com Object Services and Consulting, Inc. http://www.objs.com Minneapolis, Minnesota 55420-5409 612-881-6498 ******************************************************************** xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From kent at trl.ibm.co.jp Tue Feb 10 02:37:15 1998 From: kent at trl.ibm.co.jp (TAMURA Kent) Date: Mon Jun 7 17:00:07 2004 Subject: IBM `XML for Java' has released. Message-ID: <9802100236.AA46457@ns.trl.ibm.com> `XML for Java' is a validating XML processor written in Java. You can download from IBM alphaWorks: http://www.alphaworks.ibm.com/formula/xml It requires Java 1.1. -- TAMURA Kent @ Tokyo Research Laboratory, IBM Japan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Tue Feb 10 07:58:50 1998 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 17:00:07 2004 Subject: IBM `XML for Java' has released. In-Reply-To: <9802100236.AA46457@ns.trl.ibm.com> Message-ID: Thanks Tamura, It does works for jdk1.1.5 or even jdk1.1.3. Since you did not give the complete source code, I can not figure out the reason that it does not work for jdk1.2beta2. Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang On Tue, 10 Feb 1998, TAMURA Kent wrote: > > `XML for Java' is a validating XML processor written in Java. > > You can download from IBM alphaWorks: > http://www.alphaworks.ibm.com/formula/xml > It requires Java 1.1. > > -- > TAMURA Kent @ Tokyo Research Laboratory, IBM Japan > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 10 09:51:13 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:07 2004 Subject: Online SAXDOM Demo Available Message-ID: <000201bd3608$c0c7f4d0$2ee044c6@donpark> I have just uploaded a browser based (currently limited to Internet Explorer 4.0) demo of SAXDOM being used from JavaScript. Although the demo is somewhat sluggish due to Java/JavaScript synchronization problems, it shows DOM being used by a scripting language just as it was designed for. Exciting! You can find the demo at: http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html Have fun, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cecile.baille-pierre at bull.net Tue Feb 10 15:31:36 1998 From: cecile.baille-pierre at bull.net (BAILLE-PIERRE Cécile) Date: Mon Jun 7 17:00:07 2004 Subject: Object Hierarchie with XML Message-ID: <01BD3640.DFB57560@belledonne.frcl.bull.fr> As I'm just begin looking at XML specifications , my question will be perhaps a nonsense (In this case I promise this question will be the first and last one!). As far as I understand, XML document has a tree-like structure which is perfect to reflect composition /aggregation entities ("my book is composed of : a title, an author, one po more paragraphs, etc ..). where child elements represent parts of the element currently defined. But how simply implement a class hierarchy, i.e "element E is derived from super-Element S and inherit attributes and properties"? C?cile. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Tue Feb 10 15:32:18 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:07 2004 Subject: XML resources updated Message-ID: <34E07251.7D36673C@jclark.com> I've updated my XML parsers and test suite to match the final XML recommendation. See http://www.jclark.com/xml for more information. The biggest change is that I've enhanced my XML implementation in C to include a general purpose, non-validating XML parser layered on top of the tokenizer. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Tue Feb 10 16:12:15 1998 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: From: Deke Smith Don Park, donpark@quake.net said on 2/6/98 8:26 PM: >It looks like XML is about to be approved as standard by W3C. Could we >please have BASE64 sections as a part of XML standard 1.0? Everyone who >support this idea, please reply to this message (short replies please to >avoid LISTRIVIA). Supported. If not officially accepted it WILL be used anyhow. Deke This is silly. The specific proposal (a BASE64 marked section) _can't be_ added at this point under the rules of the W3C. It's also unlikely to fly in XML 1.1 for two reasons (which are more substantial technical problems with the proposal as it stands): 1. The proposed syntax is not compatible with SGML syntax, and can't be made compatible without changes in SGML (violating the goals of the XML project). 2. The effect desired can be easily obtained in XML by the use of NOTATION. For example: could be replaced by (in the instance): ..base64data.. for a WF-checking application, the following DTD would be required: For validation, you'd have to declare the notation (by adding this to the DTD or the internal subset): I may have made some detail mistakes, because I can't get to the standard right now, but the basic point is that to handle base64 encoding (or any other encoding expressible in the XML character set) you need only declare and attach a notation attribute. If you don't like notation, you can even just use an attribute value and keyword and skip the notation declaration. I don't remember the character repertoire of BASE64, but the fact that it's email safe means that the escaping issues are certainly no harer than those for any XML text content. If you really want to avoid escaping characters, you can use references to external unparsed entities to avoid the problem altogether. For the above reasons I expect that it _won't_ be used anyhow, except by people who don't mind their documents being rejected by conforming parsers. Given the presence of a simple way to do this _inside_ XML, the need is unlikely to be regarded as being so critical that conformance is irrelevant. -- David ------------------------------------------+---------------------------- David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ | MAPA: mapping for the WWW xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From wilfr at mail.bc.rogers.wave.ca Tue Feb 10 18:14:53 1998 From: wilfr at mail.bc.rogers.wave.ca (Wilf Reedijk) Date: Mon Jun 7 17:00:07 2004 Subject: IBM `XML for Java' has released. References: <9802100236.AA46457@ns.trl.ibm.com> Message-ID: <34E09932.EA59655E@rogers.wave.ca> I just downloaded xml4j from the IBM site. I tried to compile the trlx application but it seems that I am missing some classes: org.xml.sax.EntityHandler etc. My classpath points to xml4j.jar. I don't see these classes in there or anywhere else in the files that I downloaded. Am I missing something? Wilf Reedijk TAMURA Kent wrote: > `XML for Java' is a validating XML processor written in Java. > > You can download from IBM alphaWorks: > http://www.alphaworks.ibm.com/formula/xml > It requires Java 1.1. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Tue Feb 10 19:08:00 1998 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 17:00:07 2004 Subject: IBM `XML for Java' has released. In-Reply-To: <34E09932.EA59655E@rogers.wave.ca> Message-ID: That's exactly what I mentioned in my previous mail. The source code ibm released is not complete. The users have to depend on the xml4j.jar file. So users can not look at the source code. Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang On Tue, 10 Feb 1998, Wilf Reedijk wrote: > I just downloaded xml4j from the IBM site. > I tried to compile the trlx application but it seems that I am missing some > classes: org.xml.sax.EntityHandler etc. My classpath points to xml4j.jar. I don't > see these classes in there or anywhere else in the files that I downloaded. Am I > missing something? > > Wilf Reedijk > > > > TAMURA Kent wrote: > > > `XML for Java' is a validating XML processor written in Java. > > > > You can download from IBM alphaWorks: > > http://www.alphaworks.ibm.com/formula/xml > > It requires Java 1.1. > > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 10 20:50:29 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <001c01bd3664$cd9c1630$2ee044c6@donpark> David, >This is silly. The specific proposal (a BASE64 marked section) _can't >be_ added at this point under the rules of the W3C. It's also unlikely >to fly in XML 1.1 for two reasons (which are more substantial >technical problems with the proposal as it stands): Form and timing of the proposal might be silly, the need is not. > 1. The proposed syntax is not compatible with SGML syntax, and can't >be made compatible without changes in SGML (violating the goals of the >XML project). Agreed. I am not a SGML whiz and I count on folks like you to point out problems. > 2. The effect desired can be easily obtained in XML by the use of >NOTATION. Notation declarations have no use for non-validating applications. IMHO, most applications will validate only during design time and never during runtime. Unless some means independent of DTD must be used to indicate that content is encoded form of some binary data. >If you don't like notation, you can even just use an attribute value >and keyword and skip the notation declaration. I don't remember the >character repertoire of BASE64, but the fact that it's email safe >means that the escaping issues are certainly no harer than those for >any XML text content. I am not really concerned about how binary data is encoded in individual XML format. I am concerned about the lack of support in the standard. As Tim Bray suggests, I am trying to put in place a recommended convention for embedding encoded data so we can all readily store and retrieve binary data. Currently, I am proposing to add two reserved attributes xml:content-encoding="base64;second-encoding-layer;third-encoding-layer" xml:content-type="mime/type" Multiple names in the encoding attribute might be going overboard but I am just thinking ahead of multilayer encoding. Such scheme could be used to embedded compressed XML document within another XML document. Should the compressed XML document be expanded inplace and fed into the parser? Hmm. Looks like there will be two levels to the proposal. Your mention of notation brings up a possible need of xml:content-notation attribute which could be used by other elements to reference the binary data. Since referenced embeded data must be defined before the first reference, placements becomes rather restricting especially if the embedded data element is not significant at the point of definition (where icon is stored inside an XML file is not important but where it is referenced is). I appreciates your comments. Regards, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Tue Feb 10 21:26:19 1998 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 In-Reply-To: <001c01bd3664$cd9c1630$2ee044c6@donpark> Message-ID: <199802102130.QAA19592@geode.ora.com> [Don Park] > Notation declarations have no use for non-validating applications. > IMHO, most applications will validate only during design time and > never during runtime. Unless some means independent of DTD must be > used to indicate that content is encoded form of some binary data. The notation mechanism is provided for exactly this purpose. I'm not sure why it's unacceptable to you, but I don't think that developing a secondary means of providing the same information is preferable. I'm not very thrilled with the way notation works, but given Dan Connolly's comments about moving MIME towards a URL-based mechanism, then MIME types can be used as notation system identifiers. You can not expect to process XML documents in total ignorance of the DTD. You can expect to process many XML documents with only the internal subset, and you can mandate for your application that notation declarations be in the internal subset. I don't see why ... ]> ... is unacceptable, but is acceptable. The first even provides for a measure of extensibility (!) that the second lacks. This discussion should probably be moved to the XML SIG, as it involves the design of XML, not its implementation. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 10 22:09:06 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:07 2004 Subject: Last minute request for BASE64 section support in XML 1.0 Message-ID: <001001bd366f$d29b1310$2ee044c6@donpark> Chris, >I'm not very thrilled with the way notation works, but given Dan >Connolly's comments about moving MIME towards a URL-based mechanism, >then MIME types can be used as notation system identifiers. That still leaves encoding format to be specified. While I have focused on BASE64, I would prefer to leave the door open for other encoding formats. >You can not expect to process XML documents in total ignorance of the >DTD. You can expect to process many XML documents with only the >internal subset, and you can mandate for your application that >notation declarations be in the internal subset. I don't see why > > >... >]> >... I was not aware that non-validating XML parsers are required to process the internal DTD subset. Is this true? Even if it was true, how could an application tell that notation="base64" attribute indicates that the content is binary data? Should we treat "base64" as a special notation name? >... >]> > Perhaps I did not make it clear. I have already gave up on the idea of using BASE64 section after realizing that it will conflict with SGML. Please read my description of my latest proposal in my last message post. It does look similar to your "notation='base64'" idea without requiring the use of notation. It allows a non-validating parser to detect whether an element's content is binary data and, if so, determine its encoding format and its MIME type. A very friendly parser could take that information and return an object which could be an image, sound, or even a Java object if the data is Java serialization data. What I just described is already working in my application. I simply pass the info to Java Activation Framework (JAF) to get mimetype specific handler for the decoded data. I am hoping to provide some of the code as reference implementation for the upcoming XML-Binary proposal. Regards, Don Park http://www.quake.net/~donpark/index.html -----Original Message----- From: Chris Maden To: xml-dev@ic.ac.uk Date: Tuesday, February 10, 1998 1:28 PM Subject: Re: Last minute request for BASE64 section support in XML 1.0 >[Don Park] > >is unacceptable, but > > >is acceptable. The first even provides for a measure of extensibility >(!) that the second lacks. > >This discussion should probably be moved to the XML SIG, as it >involves the design of XML, not its implementation. > >-Chris >-- > >"http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 >90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dcarlson at ontogenics.com Tue Feb 10 23:08:43 1998 From: dcarlson at ontogenics.com (Dave Carlson) Date: Mon Jun 7 17:00:08 2004 Subject: IBM `XML for Java' has released. Message-ID: <2.2.32.19980210230321.00f87ca0@pop.dimensional.com> I have not looked at the IBM package, but the org.xml.sax.EntityHandler class is in the SAX distribution. See: http://www.microstar.com/XML/SAX/ I assume that IBM implemented a SAX driver for their parser. Sounds good! Dave At 10:15 AM 2/10/98 -0800, you wrote: >I just downloaded xml4j from the IBM site. >I tried to compile the trlx application but it seems that I am missing some >classes: org.xml.sax.EntityHandler etc. My classpath points to xml4j.jar. I don't >see these classes in there or anywhere else in the files that I downloaded. Am I >missing something? > >Wilf Reedijk > > > >TAMURA Kent wrote: > >> `XML for Java' is a validating XML processor written in Java. >> >> You can download from IBM alphaWorks: >> http://www.alphaworks.ibm.com/formula/xml >> It requires Java 1.1. > > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From norbert at datachannel.com Tue Feb 10 23:36:37 1998 From: norbert at datachannel.com (Norbert Mikula) Date: Mon Jun 7 17:00:08 2004 Subject: DXP - DataChannel XML Parser 1.0 Beta available Message-ID: <066401bd367c$94a3a880$830a1bac@norbert.datachannel.com> Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: Norbert H. Mikula.vcf Type: text/x-vcard Size: 492 bytes Desc: not available Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980210/93cd749b/NorbertH.Mikula.vcf From peter at ursus.demon.co.uk Wed Feb 11 00:52:08 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:08 2004 Subject: XML as W3C Recommendation Message-ID: <3.0.1.16.19980211004432.224717fc@pop3.demon.co.uk> I am sure that most of you know that the W3C announced today that XML 1.0 was a Recommendation. The details are at: http://www.w3.org/XML This is a milestone in a very exciting quest, and there are many people and organisations who deserve credit. In my experience it is one of the best decision-making processes I have been acquainted with. Note that there are many issues still actively under consideration. It is important that XML-DEV members are aware that there are active working groups on these issues - these are listed on the W3 site. They include further developments in XML itself, XLL, XSL, namespaces, RDF, etc. I know it is frustrating for those 'not in the club', but much of the current formal discussion is confidential. The various WGs release information here as soon as it is reasonable. We have to accept, therefore, that it is not useful to discuss possible revisions of the drafts in this forum. The members of the SIG and the WGs have agreed to tight communal procedures, which at times require saying nothing :-) and it will help if we do the same. Please, therefore, accept the Recommendations and drafts as they are published and try to work with them. By all means report *implementation* problems and concerns here, but be assured that all 'vibes' will get back to the various groups. The biggest contributions will come from showing how the spec can be used to solve problems in practice. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Wed Feb 11 02:02:00 1998 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 17:00:08 2004 Subject: IBM `XML for Java' has released. In-Reply-To: <9802100236.AA46457@ns.trl.ibm.com> Message-ID: Hi, Tamura, When I ran the parser with jdk 1.1.3, it gave me the following error message: java trlx -d personal.xml java.lang.InternalError: Converter malfunction(UTF8) -- please send a bug report to java-io@java.sun.com at java.io.InputStreamReader.malfunction(InputStreamReader.java:119) at java.io.InputStreamReader.convertInto(InputStreamReader.java:133) at java.io.InputStreamReader.fill(InputStreamReader.java:177) On at java.io.InputStreamReader.read(InputStreamReader.java:235) > > at java.io.BufferedReader.fill(BufferedReader.java:144) at java.io.BufferedReader.read(BufferedReader.java:161) at com.ibm.xml.parser.XMLReader.read(XMLReader.java:292) at com.ibm.xml.parser.FileReading.getChar(FileReading.java:29) at com.ibm.xml.parser.Token.getChar(Token.java:34) at com.ibm.xml.parser.Parser.readStream(Parser.java:419) at com.ibm.xml.parser.trlx.main(trlx.java:143) at trlx.main(trlx.java:19) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From smith at interlog.com Wed Feb 11 08:00:40 1998 From: smith at interlog.com (Chris Smith) Date: Mon Jun 7 17:00:08 2004 Subject: Encoded XML Content In-Reply-To: Message-ID: The discussion has covered some good points up to now. I'll try to build on it, and move forward. Let's be clear about what we're trying to solve here. Unicode has essentially solved the text problem. This note focuses on non-textual data, or places where a different character encoding is required inside your document. For some applications, base64 will be be easy to use. Binary data will be present in particular locations in the XML tree, and the applications will simply know to decode it. These don't really need anything new, but will benefit if there is a common technique for handling it. I think the real target is 'container' elements, where the designer needs to allow for flexibility in content at runtime. It is possible to do part of this with elements, but you run into two difficulties. First, you eventually hit your non-text data, and you have to provide some indication of the content and format. Second, you may have a real need to allow for formats that have never been forseen. What we don't need to do is provide another mechanism for managing XML markup and structure. XML parsers will not be asked to do anything different. This is entirely about how developers will use XML's features to resolve an often-encountered problem. (That's why this still belongs on xml-dev.) That said, the moment you move away from Unicode data content, you face a number of issues. You will probably have to specify a wrapper layer used to make the data XML-friendly. If that is removed, then you will have to note what format or conventions apply to the next layer. Ultimately you will reach either a text layer or a binary data layer, which cannot be further unwrapped. That layer may need a descriptor, to specify what type of data was carried with all this effort. The question I still haven't completely resolved is - is there a need for allowing an arbitrary number of layers, or is three sufficient? That is the 'content encoding', 'content format', and 'content type'? I'm not certain it's sufficient, but I can't see a use for much more at the moment. (I'm not tightly attached to the labels, but I think they work, and at least they're a start.) The most likely implementations seem to be with these as attributes. Attributes that are not present would have a default of a zero-length string. Below, I've listed a number of items, in the interests of ensuring that any proposed solution can handle them all. (Ultimately, such a table would be useful to developers.) What Is It? Content Content Content Encoding Format Type -------------- -------- --------------- ----------------- JPEG image base64 mime:image/jpeg ASCII text base64 ISO-8859-1 mime:text/plain HTML text base64 ISO-8859-1 mime:text/html XML content XML carried xml: XML carried base64 ISO-10646-UCS-2 mime:text/xml XML data only xml:pcdata private data hex x-private:somedata private text base64 Commodore64 x-private:sometext embedded item base64 ISO-8859-1 rfc:822 embedded item base64 mime:application/x-zip I thought about separating content-type from the content-domain, but I can't see that you would specify them separately all that often. The above seems to support several required ideas: 1) Standard XML content requires no settings at all. This is the degenerate case, and it is good that it works this way. 2) Standard XML content could be structured using a DTD specified using namespace techniques. This appears to be an available option without changing any of the infrastructure around encoding. 3) It supports MIME types, but does not require them. Other domains can be used bsides MIME, including completely private or proprietary formats. 4) There is some consistency. Notice that whenever you specify a text type, you must provide a content-format. Otherwise, the text is the same as the surrounding XML. Whenever you specify any content-format that is different than the surrounding XML, you must use a content-encoding to restore XML friendliness. 4) So far, just about anything you can throw in there that has any current structure looks to be workable. An example element using these, called 'container' could be defined as shown below. I've limited the strings in content-encoding. Is this a good idea? There would be some structure applied to the content-format and content-type, but I don't think it would be effectively captured in the DTD. Comments aren't just welcome - they're essential! --------------------------------------------------------------------------- Chris Smith xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Feb 11 11:16:29 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:08 2004 Subject: Preliminary XML-Binary prototype demo Message-ID: <000e01bd36dd$ce0924e0$2ee044c6@donpark> I am a hands-on guy so I have already put together a version of XML-Binary implementation in my own application which works pretty well although I am starting to see some limitations as I see new ways of using it. Since it worked so well, I thought you guys might want to see something working as well so I changed the SAXDOM demo to handle XML-Binary elements. Just go to the SAXDOM demo and parse the Binary.xml file to see an image appear in midst of colorized XML document. I am still having problems with Navigator so use IE 4.0 if you got one. If not, don't sweat it. Just examine to JavaScript code to see what is going on and then check out the Binary.xml file to see how XML-Binary is expressed in XML. The demo uses xml:content-encoding and xml:content-type to indicate the encoding type and content type in MIME. There are lots of issues but I really want to keep the initial version (level 1) out there rather quickly to address to basic needs first. I will post the list of issues real soon now. I do appeciate the comments regarding XML-Binary but I sure could use more, particularly from the XML-WG members and application developers in need of embedded binary data. I realize that XML-Binary activity is somewhat on the border of design and implementation domain but since I am not on the XML-SIG mailing list, I have no choice but to grill the shrimps on the sidewalk. Yummy, that smells good!;-) Don Park http://www.quake.net/~donpark/index.html PS: demo is at http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html PS: Robin, this is NOT an announcement! . xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From grove at infotek.no Wed Feb 11 18:02:05 1998 From: grove at infotek.no (Geir Ove Gronmo) Date: Mon Jun 7 17:00:08 2004 Subject: SAX: Empty elements Message-ID: <3.0.2.32.19980211185910.009da370@jenufa.infotek.no> While working on implementations using SAX I've noticed that there is no way to know if an element is an empty element or not (e.g ). This could perhaps be done using some kind of lookahead, but should that be necessary? Perhaps a change to the startElement method in the DocumentHandler interface could fix this. This is how the method is defined in the Draft Specification (1998-01-12): public void startElement (String name, AttributeMap attributes) throws Exception Perhaps this should be something like: public void startElement (String name, AttributeMap attributes, boolean isempty) throws Exception Best regards, Geir O. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Wed Feb 11 19:44:57 1998 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:08 2004 Subject: Call for presentations: XML Dev Day 3/27 Message-ID: <199802111942.LAA23645@boethius.eng.sun.com> CALL FOR PRESENTATIONS: XML DEVELOPERS' DAY 1998.03.27 A one-day technical conference for XML developers will be held Friday, March 27, in Seattle, Washington. The event constitutes the last day of the GCA XML Conference (http://www.gca.org/conf/xmlcon98/). XML Developers' Day is a single-track event devoted entirely to technical reports on the latest developments in XML implementation. If you are engaged in the construction of any software that works with XML -- converters, parsers, servers, browsers, editors, or XML-based vertical applications -- here is your chance to share your work with an audience that can understand and appreciate it. Since stylesheet-based rendering is part of XML publishing, developers of tools that support XSL or DSSSL are invited to show their latest offerings as well. We're also open to presentations on XML-based languages (CML, OFX, etc.) and related efforts that might have a significant impact on the future of XML (RDF, XML-Data, etc.) if they are of particular interest to XML developers. Vendors of commercial tools can participate, but they must confine their presentations to the technical aspects of current XML products in development. Table space will be made available for the distribution of product announcements and commercial literature. REGISTRATION The registration fee for XML Developers' Day is $275 for GCA members and $390 for non-GCA members (see the registration page below for conference and tutorial rates). This is mighty inexpensive for an inside update on the very latest activity in this field. You can register at http://www.gca.org/conf/xmlcon98/registra.htm N.B.: Presenters get in free. CALL FOR PRESENTATIONS If you would like to give a report at this event, send a paragraph or two describing your presentation, based on a conservative estimate of the status of your project as it will stand on March 27, to Jon Bosak (bosak@eng.sun.com). Also include a description of the audio-visual equipment you will need for your presentation and an estimate of its duration. Please include the phrase "XML Dev Day" somewhere in the subject line of your message. Since we want up-to-the-minute reports on activities in progress, there will be no published proceedings, and therefore you need not submit your entire presentation in advance. But please try to make your forecasted description as accurate as possible so that we can choose the most interesting and relevant submissions. The deadline for submissions is Friday, February 27. Jon ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems 901 San Antonio Road, MPK17-101, Palo Alto, California 94043 ---------------------------------------------------------------------- If a man look sharply and attentively, he shall see Fortune; for though she be blind, yet she is not invisible. -- Francis Bacon ---------------------------------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Feb 11 20:09:52 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements Message-ID: <003d01bd3728$549089a0$2ee044c6@donpark> Geir, >While working on implementations using SAX I've noticed that there is no >way to know if an element is an empty element or not (e.g ). This >could perhaps be done using some kind of lookahead, but should that be >necessary? Are you unable to process the element in endElement() callback? Typical DocumentHandler implementation must keep track of current element so all you have to inside endElement() is check to see the current element has no children and no attributes. Perhaps your have a different need but it seems like an implementation strategy issue. Best regards, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Thu Feb 12 00:55:52 1998 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 17:00:08 2004 Subject: Object Hierarchie with XML Message-ID: <5BF896CAFE8DD111812400805F1991F7DCC998@red-msg-08.dns.microsoft.com> A type hierarchy would use a vocabulary (schema) designed for that purpose. Such vocabularies are not presently part of XML per se, though you can find type-hierarchy concepts discussed in several papers, such as those at the W3C RDF site and in a paper that I co-authored, http://www.w3.org/TR/1998/NOTE-XML-data-0105/Overview.html. > -----Original Message----- > From: BAILLE-PIERRE Cécile [SMTP:cecile.baille-pierre@bull.net] > Sent: Tuesday, February 10, 1998 7:28 AM > To: Mailing Liste XML-DEV/Messages (Adresse de messagerie) > Cc: Cécile Baille-Pierre (Adresse de messagerie) > Subject: Object Hierarchie with XML > > As I'm just begin looking at XML specifications , my question will be > perhaps a nonsense (In this case I promise this question will be the first > and last one!). > As far as I understand, XML document has a tree-like structure which is > perfect to reflect composition /aggregation entities ("my book is composed > of : a title, an author, one po more paragraphs, etc ..). where child > elements represent parts of the element currently defined. > But how simply implement a class hierarchy, i.e "element E is derived from > super-Element S and inherit attributes and properties"? > > Cécile. > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjaakkol at cs.Helsinki.FI Thu Feb 12 13:06:10 1998 From: jjaakkol at cs.Helsinki.FI (Jani Jaakkola) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements In-Reply-To: <003d01bd3728$549089a0$2ee044c6@donpark> Message-ID: On Wed, 11 Feb 1998, Don Park wrote: > Geir, > > >While working on implementations using SAX I've noticed that there is no > >way to know if an element is an empty element or not (e.g ). This > >could perhaps be done using some kind of lookahead, but should that be > >necessary? > > Are you unable to process the element in endElement() callback? Typical > DocumentHandler implementation must keep track of current element so all you > have to inside endElement() is check to see the current element has no > children and no attributes. Yes, but in SGML and XML element type which has been declared empty in the DTD and therefore is marked with tag in XML is different thing from element which just happens to be empty (e.g ). In SP:s generic interface StartElementEvent events have ContentType property which can have value empty. Without this property it would be impossible to produce valid SGML-instance using the event stream because element types which are declared empty are marked up differently from element types which just happen to be empty sometimes. I'd say that an parser API which does not provide information about empty declared elements is broken and should be fixed. (i haven't looked at SAX:s API though) - Jani xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cecile.baille-pierre at bull.net Thu Feb 12 13:46:25 1998 From: cecile.baille-pierre at bull.net (BAILLE-PIERRE Cécile) Date: Mon Jun 7 17:00:08 2004 Subject: No subject Message-ID: <01BD37C4.792CD1A0@belledonne.frcl.bull.fr> HELP!!! I desperately search some Web site : clear, dicdactic, complete about XML syntax, something which will be more understanding than directly: [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= '' [72] PEDecl ::= '' [73] EntityDef ::= EntityValue | (ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID http://www.w3.org/TR/1998/REC-xml-19980210 doesn't give enough examples (from my point of view). I've already found very interesting XML sites, which describe some XML implementations, some others give an overview, but -like XML FAQ- are too general. I would like to find a sort of "Reader Digest", which will explain and illustrate point by point XML terminoloy: parameter/general entities, notations, Attributee lists .. and so on. Book's reference will be welcome too. Thanks. Cecile. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cecile.baille-pierre at bull.net Thu Feb 12 13:56:46 1998 From: cecile.baille-pierre at bull.net (BAILLE-PIERRE Cécile) Date: Mon Jun 7 17:00:08 2004 Subject: No subject Message-ID: <01BD37C5.E564DEC0@belledonne.frcl.bull.fr> In my previous message, I said that http://www.w3.org/TR/1998/REC-xml-19980210 doesn't give enough examples From k_coffin at conknet.com Thu Feb 12 14:37:37 1998 From: k_coffin at conknet.com (Kerry Coffin) Date: Mon Jun 7 17:00:08 2004 Subject: Message-ID: <01bd37c3$9563ad90$f00620ce@lbynum.esri.com> I'd like this same help. Thanks Kerry Coffin -----Original Message----- From: BAILLE-PIERRE C?cile To: Mailing Liste XML-DEV/Messages (Adresse de messagerie) Cc: C?cile Baille-Pierre (Adresse de messagerie) Date: Thursday, February 12, 1998 9:04 AM HELP!!! I desperately search some Web site : clear, dicdactic, complete about XML syntax, something which will be more understanding than directly: [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= '' [72] PEDecl ::= '' [73] EntityDef ::= EntityValue | (ExternalID NDataDecl?) [74] PEDef ::= EntityValue | ExternalID http://www.w3.org/TR/1998/REC-xml-19980210 doesn't give enough examples (from my point of view). I've already found very interesting XML sites, which describe some XML implementations, some others give an overview, but -like XML FAQ- are too general. I would like to find a sort of "Reader Digest", which will explain and illustrate point by point XML terminoloy: parameter/general entities, notations, Attributee lists .. and so on. Book's reference will be welcome too. Thanks. Cecile. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 12 14:38:40 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements In-Reply-To: References: <003d01bd3728$549089a0$2ee044c6@donpark> Message-ID: <199802121436.GAA00313@unready.microstar.com> Jani Jaakkola writes: > Yes, but in SGML and XML element type which has been declared empty > in the DTD and therefore is marked with tag in XML > is different thing from element which just happens to be > empty (e.g ). Actually, that's not generally the case in XML. Here's what the REC says (Section 3.1 "Start-Tags, End-Tags, and Empty-Element Tags"): Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY. Examples of empty elements:


Here's the definition of "for interoperability": for interoperability A non-binding recommendation included to increase the chances that XML documents can be processed by the existing installed base of SGML processors which predate the WebSGML Adaptations Annex to ISO 8879. In other words, XML processors may (and should) treat

and
as equivalent, but document authors might want to make the distinction so that pre-WebSGML SGML parsers can handle their documents. That begs the question of the processor's information set, however -- a processor designed for use with repositories or with editors, for example, needs to preserve lexical as well as structural information about the XML document, such as comments, general entity references (even within attribute values), specified vs. defaulted attribute values, CDATA sections, whitespace within tags, etc. SAX as it currently stands is not designed to preserve most lexical information; in the future, we may devise a SAX level-2 to return this information, but since most applications that need it will probably use a DOM anyway, the demand may not be strong enough. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjaakkol at cs.Helsinki.FI Thu Feb 12 15:28:53 1998 From: jjaakkol at cs.Helsinki.FI (Jani Jaakkola) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements In-Reply-To: <199802121436.GAA00313@unready.microstar.com> Message-ID: On Thu, 12 Feb 1998, David Megginson wrote: > In other words, XML processors may (and should) treat > >

> > and > >
> > as equivalent, but document authors might want to make the distinction > so that pre-WebSGML SGML parsers can handle their documents. Ah. Pardon me my ignorance. Different syntax for empty elements in XML or SGML was a nuisance anyway, so this seems to be a one more thing fixed. > SAX as it currently stands is not designed to preserve most lexical > information; in the future, we may devise a SAX level-2 to return this > information, but since most applications that need it will probably > use a DOM anyway, the demand may not be strong enough. If i understood this correctly, SAX is also not designed for interoperatibility. If you want to generate pre-WebSGML from XML using SAX (and accept that lexical information is not preserved), you still would need the ability to detect empty declared elements. - Jani xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From srn at techno.com Thu Feb 12 16:15:28 1998 From: srn at techno.com (Steven R. Newcomb) Date: Mon Jun 7 17:00:08 2004 Subject: Object Hierarchie with XML In-Reply-To: <5BF896CAFE8DD111812400805F1991F7DCC998@red-msg-08.dns.microsoft.com> (message from Andrew Layman on Wed, 11 Feb 1998 16:55:40 -0800) Message-ID: <199802121510.KAA00885@bruno.techno.com> [Cécile Baille-Pierre (cecile.baille-pierre@bull.net):] > > But how simply implement a class hierarchy, i.e "element E is > > derived from super-Element S and inherit attributes and properties"? [Andrew Layman (andrewl@microsoft.com):] > A type hierarchy would use a vocabulary (schema) designed for that > purpose. Such vocabularies are not presently part of XML per se, > though you can find type-hierarchy concepts discussed in several > papers, such as those at the W3C RDF site and in a paper that I > co-authored, > http://www.w3.org/TR/1998/NOTE-XML-data-0105/Overview.html. In fact, this capability is already available to XML users, by virtue of the fact that the derivation of object types from one another is provided by ISO/IEC 10744:1997 for SGML in general, and this standard has been amended specifically to allow XML's use of these concepts by means of an XML-legal PI-based declaration syntax. There is literally nothing to prevent the adoption and use of this facility by anyone, regardless of whether W3C chooses to acknowledge that this internationally standardized facility exists. The idea of object type inheritance is far too useful for XML users to ignore it forever. As the ISO 10744 "enabling architectures" facility demonstrates, it is not necessary to create a special DTD syntax or a special kind of schema to support hierarchies of element type inheritance. What is needed is a way to inherit the semantics and structure of any element types of any DTDs (schemas), regardless of whether they were intended to be inherited. That kind of functionality (among others) is supported by this facility. There is a pointer to the relevant standard at http://www.hytime.org. When you get there, look in the table of contents for Annex A. A.3 ("Architectural Form Definition Requirements [AFDR]") is where the "enabling architectures" facility is described. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) fax +1 972 994 0087 (at ISOGEN: +1 214 953 3152) 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 12 16:20:58 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements In-Reply-To: References: <199802121436.GAA00313@unready.microstar.com> Message-ID: <199802121620.IAA00662@unready.microstar.com> Jani Jaakkola writes: > If i understood this correctly, SAX is also not designed for > interoperatibility. If you want to generate pre-WebSGML from XML > using SAX (and accept that lexical information is not preserved), > you still would need the ability to detect empty declared elements. SAX is an XML processing interface rather than an authoring interface, so interoperability is not exactly an applicable concept (though I do understand what you mean). That said, some XML tools that use SAX also have their own interfaces that can provide you with DTD information -- for one example, see AElfred at http://www.microstar.com/XML/ All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From grk at arlut.utexas.edu Thu Feb 12 20:37:59 1998 From: grk at arlut.utexas.edu (Glenn R. Kronschnabl) Date: Mon Jun 7 17:00:08 2004 Subject: SAX/DOM IDL -> C++ Mapping / Confused Message-ID: <199802122037.OAA06856@mail-firewall.arlut.utexas.edu> Hi, I was trying to duplicate the SAX/DOM java stuff in C++ (and interface with SP). Now, I am a IDL newbie, but according the DOM spec, Node is defined to be just an interface. In the SAXDOM code that's straightforward since java understands interfaces. HOWEVER, in C++, according to the IDL -> C++ mapping, an interface is supposed to be constructed as a ABSTRACT base class using pure virtual functions. The problem is when I try to enumerate over them, I get a 'can't cast up from a virtual base class' error. Here is abbreviated source. Obviously, I am making a fundamental mistake. Can some kind person out there clue me in? Thanks. grk$ g++ n.cc n.cc: In function `int main()': n.cc:82: cannot cast up from virtual baseclass `Node' ----- cut here --- #include #include class NodeList; class NodeEnumerator; class Node { enum NodeType {DOCUMENT, ELEMENT}; public: virtual NodeType getNodeType() = 0; virtual Node* getParentNode() = 0; virtual NodeList* getChildren() = 0; }; class Element : public virtual Node { public: virtual string getTagName() = 0; virtual NodeEnumerator* getElementsByTagName() = 0; }; class SaxNode : public virtual Node { public: NodeType type; Node* parent; NodeList* children; virtual NodeType getNodeType() { return type; } virtual Node* getParentNode() { return parent; } virtual NodeList* getChildren() { return children; } }; class SaxElement : public virtual Node, public Element, public SaxNode { public: string tagName; virtual NodeEnumerator* getElementsByTagName() { } virtual string getTagName() { return string("SaxElement"); } }; class NodeList { public: virtual NodeEnumerator* getEnumerator() = 0; }; class SaxNodeEnumerator; class SaxNodeList: public list, public NodeList { public: virtual NodeEnumerator* getEnumerator() { } }; class NodeEnumerator { public: virtual Node* getFirst() = 0; }; class SaxNodeEnumerator : public NodeEnumerator { public: Node* getFirst() { } }; main() { SaxElement se; SaxNodeList* list = (SaxNodeList*) se.getChildren(); SaxNodeList::iterator snode = list->begin(); for (; snode != list->end(); ++snode) { (*snode)->getNodeType(); SaxElement* elem = (SaxElement*) (*snode); elem->getTagName(); SaxNodeEnumerator* e2 = (SaxNodeEnumerator*) elem->getElementsByTagName(); SaxNode* s2 = (SaxNode*) e2->getFirst(); // SaxNode snode = (SaxNode*) (*node); // cout << node->getNodeType() << endl; } } --- cut here ---- Cheers, Glenn -------------------- Glenn R. Kronschnabl Applied Research Laboratories | grk@arlut.utexas.edu (PGP/MIME ok) The University of Texas at Austin | http://www.arlut.utexas.edu/~grk PO Box 8029, Austin, TX 78713-8029 | (Ph) 512.835.3642 (FAX) 512.835.3808 10,000 Burnet Road, Austin, TX 78758 | ... but an Aggie at heart! xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fasthand at bigfoot.com Thu Feb 12 22:05:30 1998 From: fasthand at bigfoot.com (fasthand@bigfoot.com) Date: Mon Jun 7 17:00:08 2004 Subject: ANN: ezDTD 1.1 DTD editor/Generator/Formatter In-Reply-To: <199802111942.LAA23645@boethius.eng.sun.com> Message-ID: <199802122202.QAA18814@cotton.vislab.olemiss.edu> ezDTD v1.1 DTD Editor/Generator/Formatter ---------------------------------------------------------------- FOA, please forgive me if you receive this mail more than once. I stated ezDTD a month ago. Some of you have tried it and gave me very value suggestions. The latest ezDTD is v1.1. You can fint it at http://www.geocities.com/SiliconValley/Haven/2638/ezDTD.htm o Why create ezDTD? ezDTD, as a handy tool, it can help 1. Quickly jumping from one element to another. 2. Complete the typing by filling something like ANY, EMPTY, #IMPLIED .. etc. 3. Export a HTML-format DTD file which has internal links among elements. Since this version ezDTD can import existing DTD, you can use it to create HTML-format document for existig DTD as well. o What's new? Version 1.1 (1998-02-12) - Modify some interface. - You can import a DTD file. As long as it does not have too complex comment structure. - Support Start Tag and End Tag definition. - Export DTD in either SGML or XML fashion (with or without the minization) - Correct the including example file appraisal.edz which did not explain itself clear enough. o Download Please check out http://www.geocities.com/SiliconValley/Haven/2638/ezDTD.htm Thanks for your time Duncan Chen fasthand@bigfoot.com ___________________________________ Duncan Chen fasthand@bigfoot.com FNC, Inc. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From hcheung at parc.xerox.com Thu Feb 12 23:50:42 1998 From: hcheung at parc.xerox.com (Harry Cheung) Date: Mon Jun 7 17:00:08 2004 Subject: Microsoft's XML parser... Message-ID: <01BD37CD.E9A0E120.hcheung@parc.xerox.com> I'm using Microsoft's Java XML parser(1.8) to generate xml, and I've run into a hitch. In building a xml document, I construct a XML document using the object model, adding children elements, etc. Now, I need to grab a XMLOutputStream from it so that I may send it on a FileInputStream. However, when I call the "save" method of Document, the XMLOutputStream returned doesn't deal with the namespaces and as a result, causes a parse failure when I try to parse the generated file. Here's a main section of the code: ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); XMLOutputStream xmlstream = doc.createOutputStream(outputStream); doc.save(xmlstream); System.err.println("XMLDocument:\n" + new String(outputStream.toByteArray())); doc is a instance of Document and the output that I print doesn't have the namespaces substituted. Now, am I going about this all wrong? Am I missing something? Harry Cheung hcheung@parc.xerox.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From pierlou at CAM.ORG Fri Feb 13 00:33:14 1998 From: pierlou at CAM.ORG (Pierre) Date: Mon Jun 7 17:00:08 2004 Subject: ANN: Database and EcmaScript support Message-ID: <01bd3816$1b3228f0$02dcdcdc@pierre> Prototype now support access to JDBC database and scripting with EcmaScript interpreter. Load a JFC table or tree with simple XML declaration. Look at the Database page and the Scripting page. http://www.cam.org/~pierlou/prototype Thanks Pierre Morel pierlou@cam.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980213/74f43e02/attachment.htm From dgd at cs.bu.edu Fri Feb 13 19:16:52 1998 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 17:00:08 2004 Subject: Empty elements In-Reply-To: <199802121436.GAA00313@unready.microstar.com> References: <003d01bd3728$549089a0$2ee044c6@donpark> Message-ID: > [snip] >In other words, XML processors may (and should) treat > >

> >and > >
> >as equivalent, but document authors might want to make the distinction >so that pre-WebSGML SGML parsers can handle their documents. Some of us think this was a significant reduction in the power of XML to represent useful information (the difference between an element that marks a point phenomenon, and one that is empty because it just doesn't have any content). >That begs the question of the processor's information set, however -- >a processor designed for use with repositories or with editors, for >example, needs to preserve lexical as well as structural information >about the XML document, such as comments, general entity references >(even within attribute values), specified vs. defaulted attribute >values, CDATA sections, whitespace within tags, etc. It's not possible to write _valid_ SGML document instances without this information, something not true of comments, DTD, info, or the other lexical information. I think the EMPTY declaration status, and the lexical form of the element occurrence are useful for that practical reason alone. >SAX as it currently stands is not designed to preserve most lexical >information; in the future, we may devise a SAX level-2 to return this >information, but since most applications that need it will probably >use a DOM anyway, the demand may not be strong enough. This information is more than purely lexical, which is why it should be in there... -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From lex at www.copsol.com Fri Feb 13 23:07:56 1998 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 17:00:08 2004 Subject: ANNOUNCE: DAE SDK Beta 2 Released Message-ID: <199802132304.RAA25660@copsol.com> PRESS RELEASE: Copernican Solutions Releases the beta 2 of the DAE SDK and DAE Server Software. NEW IN THIS RELEASE * A new XML 1.0 Well-formed processor. * Faster style application once the style is loaded. * Faster SDQL execution. * An update to the Scheme environment supporting JIT compilers and Java 1.1 readers. * Updates for using the DAE SDK with a JIT compiler. * Element ID attribute support for SGML. * All demos are now XML-based (see the JSPI for SGML demos). * A new package (COM.copsol.tools.html) was added for writing HTML groves. * Updates for XMLWriter to support writing valid XML. PRODUCT DESCRIPTION: DAE (Document Application Environment) is a Java-based SDK for processing XML documents. The foundations of this SDK is the DSSSL Developer's Toolkit developed at Copernican Solutions. This toolkit is based on a componentized design allowing different technology components to be substituted in the DAE environment without affecting the other components. The DAE currently supports: DSSSL SDQL, DSSSL Style Language, DSSSL Groves, XML processing, and Scheme or Java Programming. In addition, the DAE has been integrated into a Java-based web server product called the DAE Server. This product allows development web-based DAE applications. An add-on component called the JSPI (Java SGML Parsing Interface) provides parsing and grove generation for SGML documents using a native component. LICENSING We strongly believe that DSSSL, SGML, and XML technology in Java is fundamental technology. In light of this, we have restructured our licensing policy allows us to ensure that the right kind on technology--especially Java-based technology--is available for use and experimentation as well as for developing commercial products. This policy also allows us to work with "Development Partners" ensuring our technology or what results from working with these Development Partners is available in a majority of web application environments. Development Partners benefit from a close development relationship, support, and immediate access to technology updates. PRICE Non-Commercial Use - Free Internal Commercial Use - Free Commercial Re-distribution - Requires a Development Partner Agreement. DOWNLOAD The DAE SDK and DAE Server software are now available for download at: http://www.copsol.com/ CONTACT Copernican Solutions Incorporated http://www.copsol.com sales@copsol.com ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Feb 14 14:06:21 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:08 2004 Subject: JUMBO-PLAY et. al. Message-ID: <3.0.1.16.19980214135027.63dfb77c@pop3.demon.co.uk> I have prepared an *alpha* version of JUMBO-PLAY (for browsing, navigating and transforming Shakespeare PLAYs conforming to Jon Bosak's PLAY.dtd) at: http://www.nottingham.ac.uk/~pazpmr/jpl9802a.zip and the latest version of JUMBO-CORE at: http://www.nottingham.ac.uk/~pazpmr/jum9802.zip Installation instructions are inside the can. This is *not* a permanent URL; the release is alpha and I'd be grateful for installation feedback in the first instance. [I am assuming that most people who are interested have been able to *load* the latest version of JUMBO-CORE since I have had little negative feedback.] JUMBO-PLAY is the first of a series of JUMBO-* extensions to JUMBO-CORE. JUMBO-* extensions allow people to customise their own application round JUMBO by subclassing JUMBO elements/classes. A more detailed account will follow in the final release but in general: - JUMBO-FOO allows you to write per-element classes. An element FOO:BAR in XML can have a class jumbo.foo.BAR.java - JUMBO-FOO maps elements onto Java using namespaces and schemas. (JUMBO-PLAY transforms the original documents into a PLAY: namespace). Each element *may*, but need not, be mapped to a Java class. - unmapped elements inherit 'reasonable' behaviour from JUMBO-CORE. Thus an element with a single PCDATA element as content will display this as a name-value pair. A element with element content will display this as a tree. An element with mixed content will display this as a tagged or untagged event stream. An element which contains little chunks of whitespace will do interesting things. - mapped elements can be customised for display, data entry, and real-time interaction limited only by your programming ability and imagination - in its simplest form the namespace schema maps elements onto classes, but it may also customise the semantics of those elements through additional information in the XML-based schema. Each element can have a XML file customising its semantics. (JUMBO-PLAY does not sue this facility). - JUMBO-FOO allows a document to be broken up (through a SAX-based parser) into entities. JUMBO-PLAY shows two examples of this. ** PLEASE NOTE THAT JON BOSAK ASKS THAT THE SHAKESPEARE DISTRIBUTION BE KEPT INTACT, SO NO PLAY FILES ARE INCLUDED IN THE DISTRIBUTION. YOU WILL NEED TO DOWNLOAD THE DISTRIBUTION YOUSELF AND RUN jumbo.play.SAXSplit ON IT TO PRODUCE INPUT FOR JUMBO-PLAY. FILES TRANSFORMED BY JUMBO-PLAY SHOULD NOT BE REDISTRIBUTED. ** Like everyone else I thank Jon for this resource. It's worth noting that the markup in PLAY is so useful as it stands that there is little point in using XML tools simply to re-render it :-). JUMBO-PLAY adds the ability to run TEI-like queries and to write indexing and analysis code - I shall add some amateurish attempts at the latter in later versions. P. [Please note that you should be able to run JUMBO-CORE before moving to JUMBO-PLAY. I intend to use this modular form of distribution since large files often break on download]. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sun Feb 15 04:09:20 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:08 2004 Subject: New SP/Jade test release Message-ID: <34E6699E.F79CEAF@jclark.com> A new test release of SP and Jade is now available from: ftp://ftp.jclark.com/pub/test/jade.zip Win32 binaries are available from: ftp://ftp.jclark.com/pub/test/jadew.zip This is SP version 1.2.92 and Jade version 1.0.93. In SP the main change since 1.2.91 is better support for XML based on the final WebSGML Adaptations Annex. There's documentation on this is xml.htm. Also the SX application has been merged in. In Jade the main change since 1.0.92 is in the FOT backend. The FOT file is now well-formed XML. It has also been changed to make it closer to the action part of an XSL style-sheet. The hyperlinking information is also represented in a more straightforward way. The idea is to make it practical both to have new backends that work from the FOT file and to have other programs that generate an FOT file. (Eventually I would like to make the existing backends be able to take input from an FOT file as well as directly from Jade.) Note that I'm discontinuing distributing non-Unicode Win32 binaries, and so the SP Win32 executables do have Unicode support but no longer a "u" suffix. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From trevort at za.ibm.com Mon Feb 16 11:43:36 1998 From: trevort at za.ibm.com (Trevor Turton) Date: Mon Jun 7 17:00:08 2004 Subject: DTD meta data for XML viewers Message-ID: <5060200011313801000002L012*@MHS> In an earlier note I spoke about the need to associate compose-time meta data with DTDs to allow XML editors to assist with the process of document composition. Another major class of meta data that needs to be associated with DTDs is information on how the associated XML may be rendered - most usefully, by identifying programs which can perform the required rendering. The classic browser renders HTML on computer screens, and also on the printed page. The same will be required of XML browsers, and some will also render XML documents to voice for the visually impaired, to Braille for the more profoundly impaired, and to other media as new needs and technologies arise. Hence we may need to associate a list of rendering programs with any given DTD, covering the various media types supported From trevort at za.ibm.com Mon Feb 16 12:23:21 1998 From: trevort at za.ibm.com (Trevor Turton) Date: Mon Jun 7 17:00:09 2004 Subject: DTD meta data for XML viewers Message-ID: <5060200011317159000002L092*@MHS> (this is a retransmit - the previous version was truncated) In an earlier note I spoke about the need to associate compose-time meta data with DTDs to allow XML editors to assist with the process of document composition. Another major class of meta data that needs to be associated with DTDs is information on how the associated XML may be rendered - most usefully, by identifying programs which can perform the required rendering. The classic browser renders HTML on computer screens, and also on the printed page. The same will be required of XML browsers, and some will also render XML documents to voice for the visually impaired, to braille for the more profoundly impaired, and to other media as new needs and technologies arise. Hence we may need to associate a list of rendering programs with any given DTD, covering the various media types supported. Current browsers attempt to render "all" HTML tags, but HTML is a moving target. Browsers are already very large, and need to be supplemented by plug-ins to handle various MIME types. Once XML is generally available, we must expect a proliferation of DTDs by various parties for varied purposes. While a single generic XML editor may do a good job of checking the syntax of documents developed against these DTDs, it cannot render them. Realistically, creators of novel DTDs will have to create code that renders them. And once a useful body of DTDs has been developed, authors of XML documents will want to use DTDs from many different independent sources in a single document. No browser manufacturer will be able to bundle rendering code for all DTDs into a single product. We need to define a standard way in which browsers can obtain and use code to render DTDs that were not even invented when the browser was created. This code distribution mechanism may be something between a plug-in, which requires manual intervention to install and persists after use, and a Java applet, which installs automatically and is discarded after use. For the purpose of this note I will call them "renderlets". We need to arrive at a standard way of associating renderlets with DTDs as meta data, so that browsers that encounter a DTD for the first time can find and obtain the code required to render the XML defined by the DTD. Some vendors may wish to create platform-specific and even browser-specific renderlets, giving rise to the need to associate a list of different renderlets with a given DTD. Most authors of DTDs will want to implement only a single renderlet to save effort. This would have to be platform and browser independent, and Java is the obvious choice. We need a standard way for Java renderlets to interface with the browser that invokes them. And since XML entities may be imbedded in other independently created XML entities, renderlets must also implement the same standard interface when they invoke imbedded renderlets. This interface will have to be richer than the current spartan tag, which makes an unconditional demand for display space. Any given XML document may in the future be rendered on anything from a IMAX screen to a Dick Tracey style watchtop display. Renderlets will have to share the space available. The browser will have to sum the space demanded by the renderlets it hosts and compute an overall compression factor. It will have to communicate the compression factor back to the various renderlets so they can make an informed decision about the level of detail that they display - if any. The renderlet interface will need to include a specialised Java LayoutManager to facilitate layout and space negotiation. Given this approach, the browser itself can become a fairly small and simple shell, with all XML elements implemented by downloadable renderlets. A cottage industry in renderlets may emerge, paralleling the VBX and OCX industry that Visual Basic spawned. Competing versions of renderlets for commonly used DTDs will arise, and browser owners will be able to shop around for the renderlets that best meet their needs. If there are concerns that fetching renderlets will generate excessive network activity and make XML browsers too slow to use, browsers (and http proxy servers) could be enhanced to allow priming of their cache directories. Pointers to local copies of often-used renderlets (and DTDs for that matter) could be loaded into the browser's (proxy server's) cache directory upon initialisation. Trevor Turton xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Mon Feb 16 16:20:58 1998 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 17:00:09 2004 Subject: New version of RXP Message-ID: <29876.199802161620@cockburn.cogsci.ed.ac.uk> There is a new (still not for public consumption) version of RXP at ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.tar.gz RXP a non-validating XML parser in C, with support for UTF-8, UTF-16, and ISO-8859-1 character encodings. It now (when run in strict XML-checking mode "rxp -x") finds all the well-formedness errors in James Clark's test suite, except those where the error is in the content model of an element declaration (this is a bug, and will eventually be fixed). Please report bugs to me. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Mon Feb 16 19:32:11 1998 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:09 2004 Subject: Fwd: ANN: New XSL Mailing List Message-ID: <199802161930.LAA25676@boethius.eng.sun.com> From: tgraham@mulberrytech.com (Tony Graham) Date: Mon, 16 Feb 1998 06:09:17 GMT Newsgroups: comp.text.sgml Subject: ANN: New XSL Mailing List Mulberry Technologies announces the availability of XSL-List, the open forum for discussion of XSL (Extensible Style Language). To subscribe to XSL-List, send mail to majordomo@mulberrytech.com with "subscribe xsl-list" as the body of your message. For more information, see http://www.mulberrytech.com/xsl/xsl-list. XSL-List will host discussion of XSL itself, XSL applications and implementation, and XSL user questions. XSL-List is open to everyone, users and developers, experts and novices alike. There is no restriction to what may be posted on the XSL-List provided it is related to XSL. XSL-List is not a W3C mailing list nor is it affiliated with W3C or any other organization. XSL-List has no official standing with any organization and XSL-List subscribers do not constitute a Special Interest Group. However, XSL-List was established with the encouragement of members of the W3C XSL Working Group, and members of the Working Group will be among the subscribers to the list. XSL-List is provided by Mulberry Technologies as a service to the XSL user community and the XSL standardization effort. Only subscribers can post to XSL-List, but since the goal is to increase the level of XSL knowledge, XSL-List is being archived on Mulberry's web site for everybody to view. The topics being discussed on the XSL-List changes as new ideas arise or existing problems are dealt with, but the archive contains all of the ideas and solutions that have been discussed on the list. Regards, Tony Graham ======================================================================= Tony Graham Mulberry Technologies, Inc. Phone: 301-315-9632 17 West Jefferson Street, Suite 207 Fax: 301-315-8285 Rockville, MD USA 20850 email: tgraham@mulberrytech.com ======================================================================= xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Tue Feb 17 15:20:00 1998 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 17:00:09 2004 Subject: DTD meta data for XML viewers In-Reply-To: <5060200011317159000002L092*@MHS> (message from Trevor Turton on Mon, 16 Feb 1998 12:21:55 +0000) Message-ID: <199802171524.KAA03175@geode.ora.com> [Trevor Turton] > Another major class of meta data that needs to be associated with > DTDs is information on how the associated XML may be rendered - most > usefully, by identifying programs which can perform the required > rendering. One word: Stylesheets. One URL: . HTH; HAND. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tony.stewart at rivcom.com Tue Feb 17 15:28:58 1998 From: tony.stewart at rivcom.com (Tony Stewart) Date: Mon Jun 7 17:00:09 2004 Subject: DTD meta data for XML viewers Message-ID: <4955E202FE46D11195C500609712EB6B06AA2F@FLPS-NTSERVER1> Trevor Turton wrote: >>Another major class of meta data that needs to be associated with DTDs is information on how the associated XML may be rendered - most usefully, by identifying programs which can perform the required rendering. The classic browser renders HTML on computer screens, and also on the printed page. The same will be required of XML browsers, and some will also render XML documents to voice for the visually impaired, to braille for the more profoundly impaired, and to other media as new needs and technologies arise. Hence we may need to associate a list of rendering programs with any given DTD, covering the various media types supported. >>Once XML is generally available, we must expect a proliferation of DTDs by various parties for varied purposes. While a single generic XML editor may do a good job of checking the syntax of documents developed against these DTDs, it cannot render them. Realistically, creators of novel DTDs will have to create code that renders them. And once a useful body of DTDs has been developed, authors of XML documents will want to use DTDs from many different independent sources in a single document. I think these issues are real, but belong in the domain of the XSL discussion. Style, presentation and behavior are all aspects of the same thing: the transformation of the data/information into another, usually human-understandable form. (I say "usually" because you could use a style sheet to transform the information into a different computer-interpretable form without ever presenting it to a human being.) XSL can and should provide the tools that allow us to specify what transformation should be applied to our XML data, including taking into account issues like user impairment, new media, etc. It is the means by which we associate multiple possible presentations--and the mechanisms for choosing between them in a given presentational instance--with data encoded according to a single DTD or schema. Having said that, yes, we will need to implement robust presentational mechanisms in (preferably) thin clients, so all of these technical issues need to be addressed. But let's make sure that XSL stays in the loop. Tony =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Tony Stewart Director of Consulting, RivCom "Publishing Structured Information" New York, NY, USA and Swindon, UK Direct: +1 (212) 222-4332 Office: +1 (212) 662-6800 Fax: +1 (212) 662-6900 UK Tel: +44 1793 790 802 UK Fax: +44 1793 790 812 Email: tony.stewart@rivcom.com Web: www.rivcom.com =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Feb 17 23:06:37 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces Message-ID: <3.0.1.16.19980217215940.0c6f0018@pop3.demon.co.uk> Posted on behalf of John Petit > >------------------------------------------ >This is a question about how the search scenario will play out on the >web once XML becomes widely implemented. I have not seen this >articulated in any of the specifications or articles on the web thus >far. In lieu of that, I have imagined how it might work. I would like >some feedback. Am I way off base? Naturally the answer will have a big >impact on the design of search engines and other services that I am >creating. > >As particular industries and special interests standardize on their >respective DTDs, Internet search engines will have to allow users to >search by specific elements contained in those documents. In the typical > >search scenario, a user would use one of the major search services such >as AltaVista or Yahoo. Lets say the user wanted to search across real >estate listings, and these listings all used the same DTD. It seems that > >independent search engines need to interpret the DTD for a class of >documents and present a query interface based on that DTD. The question >is: how is the search engine to interpret the DTD and build an >intelligent interface based on that DTD? Simply listing every element in > >the DTD is one approach, but an ugly one. Many DTDs will contain >numerous elements which would only clutter and confuse a search >interface. > >One solution may be to use DTD attributes to cue the search engines. >Perhaps a "LEVEL" attribute could cue the searchers to display >interfaces to predefined levels. The example below shows that the >"LEVEL" attribute means that the "numbeds" element should always appear >in a search query, or at the top level or searches. Any elements that >did not have this level 1 attribute would not be shown in the search >interface. If the "LEVEL" attribute was not found in the DTD, the >default would show all of the elements with search fields next to them. > > > XML-SQLTYPE INTEGER #FIXED > SNAME CDATA #FIXED "Number of beds" > LEVEL CDATA #FIXED "1"> > >Search engines, upon seeing the "LEVEL" attribute, would configure their > >interface to have an "Additional Elements" button that would show the >next level of elements. This would have the effect of shielding the user > >from an overwhelming mass of searchable elements. Perhaps these >mechanisms are in place, but I just do not see them. > >Another useful attribute would describe the "shown name" for a >particular element. Element tags may not have as descriptive a name as >they should in the DTD itself. For example, having "numbeds" appear in >the user search interface would not be very user friendly. A much more >descriptive string would be "Number of beds." > >The "XML-SQLTYPE" attribute indicates that "numbeds" is an integer. This > >is a form of strong typing that was described at one time by Tim Bray. I > >also do not know the status of strong typing in XML, but strong typing >would sure be useful in this situation. If a search engine knows that a >field is going to be a number, then the engine can provide optional >number manipulations. Such useful operations may be determining price >ranges, or in this case, a range for the number of bedrooms. Otherwise, >how will an independent search engine or agent know that a particular >field can be ranged and mathematically manipulated? > >I certanly do not think that these attributes should be mandatory, but >it seems that there should be an agreed upon method of DTD construction >that would give clues to search engines. I am clearly not an expert in >this area, but I have not seen a solution to this in the XML proposals >published thus far. Does anyone have an answer for this? > > > > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 18 01:02:06 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces In-Reply-To: <3.0.1.16.19980217215940.0c6f0018@pop3.demon.co.uk> References: <3.0.1.16.19980217215940.0c6f0018@pop3.demon.co.uk> Message-ID: <199802180100.UAA00316@unready.microstar.com> Peter Murray-Rust writes: > Posted on behalf of John Petit > >One solution may be to use DTD attributes to cue the search engines. > >Perhaps a "LEVEL" attribute could cue the searchers to display > >interfaces to predefined levels. The example below shows that the > >"LEVEL" attribute means that the "numbeds" element should always appear > >in a search query, or at the top level or searches. Any elements that > >did not have this level 1 attribute would not be shown in the search > >interface. If the "LEVEL" attribute was not found in the DTD, the > >default would show all of the elements with search fields next to them. > > > > > > > XML-SQLTYPE INTEGER #FIXED > > SNAME CDATA #FIXED "Number of beds" > > LEVEL CDATA #FIXED "1"> You could generalise this idea so that, instead of giving the level, you gave the element type name in a different (real or hypothetical DTD). In other words, In other words, you're saying that the 'numbeds' element corresponds to 'div1' (first-level division) in the other DTD. This is more useful, because you can express richer relationships than simply the level. For example, you could specify that 'expletive-deleted' is a type of emphasised phrase and that 'city' is a type of name: That way, a user could search for any type of emphasised phrase or proper noun, no matter what the element type was named. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From sudar at pspl.co.in Wed Feb 18 04:43:27 1998 From: sudar at pspl.co.in (Sudarshan Purohit) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces Message-ID: <34EA6699.38FF@pspl.co.in> Peter Murray-Rust wrote : > >..... > >One solution may be to use DTD attributes to cue the search engines. > >Perhaps a "LEVEL" attribute could cue the searchers to display > >interfaces to predefined levels. The example below shows that the > >"LEVEL" attribute means that the "numbeds" element should always ..... I'm rather new to this, so it's possible that i'm thinking wrongly... Anyhow, I'd like to add one more point to this idea : When we say that the hotel is making it's data available as XML on the web, what it will actually be doing is translating the data in it's hotel management database into XML, almost certainly through some Database-XML interface. This will have to be done at very frequent intervals, in both directions, in order to show the latest status of , say, bookings. But doing this requires a standardised mechanism to delineate the XML elements as tables/columns/keys/other entities according to their DBMS. I feel that this mechanism (say, having an attribute listing this 'level' ) should be worked out so as to facilitate web search engines as well. XML-Data gives the basic features in this respect, by allowing keys, etc. This could be built upon. Any such XML should also be readable by other similar programs (say by the software used by a travel agent who adds this into his own database) besides casual browsers. Does this sound reasonable? Sudarshan Purohit 18th feb 98 PSPL,Pune, India 1010hrs. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Feb 18 07:29:53 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces Message-ID: <002b01bd3c3e$4ca4ce70$2ee044c6@donpark> I have a tendency to talk about things yet to happen as if I saw it happen, so I must first beg the reader to understand that what follows are just an opinion of a man. >>As particular industries and special interests standardize on their >>respective DTDs, Internet search engines will have to allow users to >>search by specific elements contained in those documents. In the typical >>search scenario, a user would use one of the major search services such >>as AltaVista or Yahoo. Lets say the user wanted to search across real >>estate listings, and these listings all used the same DTD. It seems that >> >>independent search engines need to interpret the DTD for a class of >>documents and present a query interface based on that DTD. The question >>is: how is the search engine to interpret the DTD and build an >>intelligent interface based on that DTD? Simply listing every element in >> >>the DTD is one approach, but an ugly one. Many DTDs will contain >>numerous elements which would only clutter and confuse a search >>interface. Standardized schemas will not be there for some time. Effects of XML will be felt by all major industries in the near future, and while there will be sincere efforts to standardize DTDs in most of the markets, fiercely competitive markets like the search service market will be slow in standardizing schemas. I expect another round of tag wars waged this time by Yahoo, Excite, AltaVista, MS, etc. The result will be different this time in that everyone will agree to disagree in the end and move on to building tools to bridge the differences in structures of contents which would have accumulated beyond the point of standardizing. Schema-based universal search interface will be dead upon arrival. While it is possible to build such clients, search services that use them will lose everytime to services offering hand-crafted search interfaces designed to be easy to use, relevantly flexible, and visually appealing. Improved accuracy of search results, brought on by wide availability of XML-based contents, will be lost to most users. Consumers simply do not care as long as they can find what they want among first 100 items returned by a search. Search services are free after all and therefore do not place high expectations. What consumers will care mostly about is the 'freshness' of search results. All of the widely used search services are currently selling stale information, a lot of it damaged goods. There is not much demand for freshness now but the need will rise dramatically along with the growth of e-commerce. XML will bring on new search services which broadcasts search requests to hundreds to thousands of 'datasites' to get the freshest goods. It will take tools to build datasites and applications to create contents for the datasites. It is not hard to guess who will be the major player in the next generation of search services. What I see happening is proliferation of custom DTDs designed around the contents. Amazon will not want to throw out some information just so they can use some standard DTD. It is like saying that they will chop your arms off just so they can use the standard-size coffin. Amazon will use a custom DTD designed to hold all of their valuable contents including book reviews. They will offer some, and definitely not all, layers of the contents to search services by dynamically mapping its DTD to the search service's DTD. In another word, DTD used to store content will not necessarily be same as DTD used to transfer. It is sad to think so but we will also see more and more contents moving behind protection. XML makes 'data-spies', 'data-pirates', and 'data-chop-shops' possible. You will see 'hot-data' detective robots roaming the net to see if any piece of a site's data is based on its clients' data based on some intentional mangling of words and images with hidden signatures. I hope I did not upset everyone with my 'it sure is obvious to me' attitude. My sole intention is to help the XML community. If I make some money along the way, I can live with it. I think . Sincerely, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Feb 18 09:23:48 1998 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces In-Reply-To: <34E871B8.CF85BB5F@4thworldtele.com> Message-ID: In message <34E871B8.CF85BB5F@4thworldtele.com>, John Petit writes >By the way, I read your book and found it very informative. Thank you! >This is a question about how the search scenario will play out on the >web once XML becomes widely implemented. I have not seen this >articulated in any of the specifications or articles on the web thus >far. In lieu of that, I have imagined how it might work. I would like >some feedback. Am I way off base? Naturally the answer will have a big >impact on the design of search engines and other services that I am >creating. John, You have already had replies that: - comment on the potential use of 'architecture'-type techniques for harmonising the semantics of element types in different DTDs, and - point out that a suitably designed representation of relational data in XML will allow SQL-type queries on data that is really a relational wolf (?!) in XML sheep's clothing The only thing I would add is that neither approach gives us a query language for searching information sources that are genuinely XML, not 'relational-in-disguise'. Peter M-R mentioned on XML-dev a couple of months ago that he uses XLL expressions as a query language - this is the only approach that is currently possible within the 'official' XML world-view. The SGML world has invented a very exhaustive query language (SDQL, which lurks within the DSSSL standard) for full SGML documents, but this is probably inappropriate for the XML world. (One weakness of SDQL is that it has a 'read-only' model of the document, whereas SQL supports table creation and updating. Depends what you want from a query language.) Richard Light. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Wed Feb 18 14:17:18 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces Message-ID: <01bd3c78$12313a00$1e09e391@mhklaptop.bra01.icl.co.uk> >>This is a question about how the search scenario will play out on the >>web once XML becomes widely implemented Some suggestions & predictions: 1. The "whole web" search services are not keeping pace with the growth of the web; they are having to index more selectively and less often. There is therefore increasing room for more specialised search services. There will certainly be some that concentrate on a particular domain (say sports results) and that get to understand the DTDs that are widespread in that domain. This may in turn act as an incentive to the standardisation of domain DTDs. 2. Search engines will probably start applying heuristics to the XML structure even if they don't know the semantics of the DTD. This comes naturally to software trying to extract information from raw text. For example, tags with recognised names such as may raise the weighting of the text contained therein; tags that contain small amounts of text may be ranked more highly than tags containing most of the document. 3. Some conventional tags such as <META> may emerge and be used in a wide range of DTDs if the search engines are known to apply special heuristics to them. Other conventional tags, e.g. for personal names or places, may also emerge. 4. The general public is only interested in doing simple searches. In more specialist communities, query languages that allow the tagging to be exploited will become available. Many search engines already have languages that support "field-sensitive" searching and I think these can largely be applied to XML without extension. Such queries only make sense within the context of a single DTD or a family of closely-related DTDs. The "navigational" query languages such as the XLL syntax or DSQL are too precise and too complex for free text searching. 5. XML may start to become a vehicle for a site to publish an abstract of itself. Search services, rather than indexing all the content of a site (which is becoming unviable) will start to index the published abstracts of sites, and having directed the enquirer towards a site, will then delegate the within-site searching to a search engine at the site itself. ======================================================== By the way, does anyone know of a search engine (I mean software, not a web service) that understands XML? I have been looking at writing an IFilter interface for Microsoft's Index Server and it's rather daunting, especially as MS will presumably produce one themselves within a year. ======================================================== Regards, Mike Kay, ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 18 15:40:53 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces In-Reply-To: <01bd3c78$12313a00$1e09e391@mhklaptop.bra01.icl.co.uk> References: <01bd3c78$12313a00$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <199802181539.KAA00290@unready.microstar.com> Michael Kay writes: > By the way, does anyone know of a search engine (I mean software, > not a web service) that understands XML? I have been looking at > writing an IFilter interface for Microsoft's Index Server and it's > rather daunting, especially as MS will presumably produce one > themselves within a year. You could customise OpenText's LiveLink search to handle XML using ranges -- the level of effort would depend on the skill and experience of your programmers (anywhere from a couple of days to a couple of months). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Thu Feb 19 01:39:08 1998 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 17:00:09 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <3.0.1.16.19980206082831.1157496c@pop3.demon.co.uk> Message-ID: <199802190139.AA00252@murata.apsdc.ksp.fujixerox.co.jp> In message "Re: Namespaces, Architectural Forms, and Sub-Documents", Peter Murray-Rust wrote... > I hope that the "disgusting" refers to the use of 'img' and 'src' and the > implied semantics rather than the mechanism :-). I am an advocate of the > *mechanism* (e.g > http://www.vsms.nottingham.ac.uk/vsms/talks/chemwebvei/020.html) where I > use XML-LINK explicitly to combine chemistry, maths and text. This has the > advantage that it avoids namespace problems. It also allows me to process > foreign files if certain assumptions are made. I think that your approach works. Do you think that this is the way to go? I.e., no namespace mechanisms but links only? Or, do you think that it should be possible to convert the link-based representation to the namespace-based representation and vice versa? Cheers, Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Thu Feb 19 02:04:33 1998 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 17:00:09 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents Message-ID: <3.0.32.19980218195752.00be8650@swbell.net> At 10:39 AM 2/19/98 +0900, MURATA Makoto wrote: > >In message "Re: Namespaces, Architectural Forms, and Sub-Documents", Peter Murray-Rust >wrote... >> I hope that the "disgusting" refers to the use of 'img' and 'src' and the >> implied semantics rather than the mechanism :-). I am an advocate of the >> *mechanism* (e.g >> http://www.vsms.nottingham.ac.uk/vsms/talks/chemwebvei/020.html) where I >> use XML-LINK explicitly to combine chemistry, maths and text. This has the >> advantage that it avoids namespace problems. It also allows me to process >> foreign files if certain assumptions are made. > >I think that your approach works. Do you think that this is the way >to go? I.e., no namespace mechanisms but links only? Or, do you think >that it should be possible to convert the link-based representation to >the namespace-based representation and vice versa? My vote is for the link-based approach (which in HyTime is provided by the value reference facility, which lets you distinquish simple use-by-reference from true hyperlinks). A processor can always generate new combined instances using whatever approach it cares to to disambiguate name clashes, including using name spaces. Syntactic combination is ultimately limiting and largely unnecessary if you can do your combining at the semantic level. However, semantic-level combination does have a cost because you can't necessarily depend on the limitations of syntactic constraints to keep things simple. Cheers, E. -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jeremie at netins.net Thu Feb 19 06:44:13 1998 From: jeremie at netins.net (Jeremie Miller) Date: Mon Jun 7 17:00:09 2004 Subject: Update: Xparse(JavaScript XML Parser) Message-ID: <008a01bd3d01$9d82d260$2801a8c0@jeremie.dbqglass.com> I've updated my JavaScript based XML parser at: http://www.jeremie.com/Dev/XML/ I added lots of little support things and it now correctly supports all of the goofy formatting things like: <tag name = "value" id = "abc123" > the tags contents </tag > It's basically done with the exception of full error reporting and processing any DTD related information. I'm waiting to add DOM support before I attempt to tackle either of those, and the DOM support is waiting for the release of the DOM ECMAScript Core API definitions(Appendix C in the current WD). So in the meantime I'm going to concentrate on my JavaScript based XSL parser :) If you're willing to wrap your XML data in a <TEXTAREA> or escape it into a JavaScript string variable, Xparse is a slick and fast way to manipulate it on the client end. Comments/ideas welcome! Jeremie Miller jer@jeremie.com http://www.jeremie.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From terje at in-progress.com Thu Feb 19 08:14:56 1998 From: terje at in-progress.com (terje@in-progress.com) Date: Mon Jun 7 17:00:09 2004 Subject: XPublish 1.0 candidate (XML website publishing system on Mac) Message-ID: <b111961b02021004e8cc@[199.106.6.97]> XPublish (read: "CrossPublish") is a complete Macintosh XML based website publishing system that automatically generates HTML websites from XML documents. The application is currently under finetuning to verify that it follows the final XML 1.0 specification. We now solicit feedback from XML savvy, and offer a solid pre-release discount (in addition to our appreciation) for those that take the time to check out the application and report eventual inconsistencies. You are invited to download the candidate of XPublish 1.0 from: http://interaction.in-progress.com/xpublish XPublish supports efficient development and maintainance of websites with XML. The built-in Cascading StyleSheets designer fosters a consistent look & feel of the sites. The application's capabilities includes to render XML into HTML with markup-emulated style sheet for older browsers that doesn't support CSS, facilitating faster deployment of XML among webmasters and demonstrating the processing power of XPublish. The distribution comes with a tutorial that gives HTML authors a gentle introduction to XML markup. Subscribe to the XPublish mailing list to receive updated information about XML, XPublish and website publishing with XML. Send the subscription request to <xpublish-request@in-progress.com> included your name and email address to join the mailing list. -- Terje <Terje@in-progress.com> | Media Design in*Progress C a s c a d e... a comprehensive Cascading Style Sheets editor for Mac XPublish - for efficient website publishing with XML Make your Web Site a Social Place with Interaction! Check out our web tools at <http://interaction.in-progress.com> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Feb 19 20:26:20 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:09 2004 Subject: Announcement: New PSGML-XML Additions Message-ID: <199802192025.PAA00328@unready.microstar.com> I've updated my XML patches for PSGML. I've had very little time to devote to this, but I've managed to make two important changes, at least (the others are still in the queue): 1) Fixed a highlight-related PSGML bug that caused errors when there was a processing instruction before the DOCTYPE declaration (this is a big problem in XML, for obvious reasons). 2) Fixed PSGML's support for the `sgml-system-path' variable, and set the initial value of the variable automatically from the environment variable SGML_SEARCH_PATH (as used by NSGMLS), if present. The second one turns out to be a very useful change. If you do something like (setq sgml-system-path '("." "/usr/local/lib/sgml/global")) or (for NSGMLS support as well) export SGML_SEARCH_PATH SGML_SEARCH_PATH=".:/usr/local/lib/sgml/global" and then put the file `spec.dtd' in /usr/local/lib/sgml/global, then you can always reference that DTD with a relative URL as if it were in the current directory (NSGMLS has always allowed this, but it wasn't fully implemented in PSGML). That means that <!DOCTYPE spec SYSTEM "spec.dtd"> works, and you no longer have to copy the DTD file into every directory that uses it. I've also fixed the parsing of environment variables so that ';' can be the separator in DOS/Windows, though I haven't tested that part yet. You can download the patches from my home page, http://home.sprynet.com/sprynet/dmeggins/ Have fun! David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Fri Feb 20 00:33:50 1998 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:09 2004 Subject: Final XML conference schedule Message-ID: <199802200031.QAA27394@boethius.eng.sun.com> I've been asked to note that the final agenda for the XML Conference is now available on the GCA Web site: www.gca.org/conf/xmlcon98/ Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 01:11:06 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:09 2004 Subject: Automating Search Interfaces" Message-ID: <3.0.1.16.19980220010616.44af540a@pop3.demon.co.uk> Forwarded from John Petit... >Cheers, John Petit > >Title "Re:Automating Search Interfaces"" >--------------------------------------------------------------------------- ---------------- > > Don Park writes: > >>>Standardized schemas will not be there for some time. Effects of XML >will >>>be felt by all major industries in the near future, and while there >will be >>>sincere efforts to standardize DTDs in most of the markets, fiercely >>>competitive markets like the search service market will be slow in >>>standardizing schemas. I expect another round of tag wars waged this >time >>>by Yahoo, Excite, AltaVista, MS, etc. The result will be different >this >>>time in that everyone will agree to disagree in the end and move on to > >>>building tools to bridge the differences in structures of contents >which >>>would have accumulated beyond the point of standardizing. > >I agree that this disheartening scenario is quite possible. But what a >shame! It seems that one of XML's major strength's is its ability to >search heterogeneous databases. Independent sellers large and small >would benefit from heterogeneous searches for it would allow super >accurate marketing. Mom and pop producers should be able to sell their >boutique goods to the special set of consumers that would be interested. >A real estate agent in Backwater USA with a unique property should be >able to sell that product in an industry standard search engine. >Without accurate, industry specific search interfaces, consumers will >not easily find these sites. Otherwise we are no better off search wise >than we are today ? wallowing in inaccurate searches. It would be a real >shame if the ultimate promises of XML were hindered by lack of >planning. Laissez-faire is not always the best way. > >Perhaps what would help is to create a central repository for major >industry DTDs. Such a repository may reduce the effects of splintering, >and accelerate development. DTD authors could see what has come before >them and either borrow from it or at least learn from it. I have always >felt that such a site would be useful in DTD development. There are >probably dozens of nascent DTD efforts going on in various industries. >Each one inventing the wheel. In many cases these authors are describing >the same element with different names when they could just as easily use >the same name. > >Taking biological evolution as an analogy, putting the DTDs in one small >pool will encourage faster and more sympathetic development. Otherwise, >isolated cyber ecosystems will encourage divergent DTD evolution and >this will lead to a long and vicious "survival of the fittest" scenario >that will not benefit anyone. > >I cannot speak for Robin Cover but the SGML/XML Web Page seems like a >good candidate for such a DTD repository. > >>>Schema-based universal search interface will be dead upon arrival. >While it >>>is possible to build such clients, search services that use them will >lose >>>everytime to services offering hand-crafted search interfaces designed >to be >>>easy to use, relevantly flexible, and visually appealing. > >It is true that hand crafted search interfaces would be more polished, >but who should be responsible for their creation. Is there some >designated Java developer in the hotel industry that will make a search >engine selflessly for the entire industry. No. If such work is relegated >to the private companies then such search engines will not represent the >entire industry in a unbiased way. This leaves nice, but proprietary >search engines, and we are right back to where we started from; searches >of privately selected database rather than searches of heterogeneous, >industry representative databases. > >>>Improved accuracy of search results, brought on by wide availability >of >>>XML-based contents, will be lost to most users. Consumers simply do >not >>>care as long as they can find what they want among first 100 items >returned >>>by a search. Search services are free after all and therefore do not >place >>>high expectations. > >I do not feel that consumers will not care about search accuracy. When a >customer is looking for variations of Ginkgo Biloba (an over-the-counter >drug) they want to see all the sites that sell it and for what price. >The same is true for travelers looking for room availability at their >travel destinations. No one wants to wade though a hundred tangentially >related sites. Without accurate search interfaces, consumers will not >get this sort of accurate response. The RDF is an important part of >describing the web, but I have not seen how it would right way to >address automating search interfaces. > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 01:15:12 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:09 2004 Subject: Namespaces, Architectural Forms, and Sub-Documents In-Reply-To: <199802190139.AA00252@murata.apsdc.ksp.fujixerox.co.jp> References: <3.0.1.16.19980206082831.1157496c@pop3.demon.co.uk> Message-ID: <3.0.1.16.19980220010622.29d7d178@pop3.demon.co.uk> At 10:39 19/02/98 +0900, MURATA Makoto wrote: > >In message "Re: Namespaces, Architectural Forms, and Sub-Documents", Peter Murray-Rust >wrote... >> I hope that the "disgusting" refers to the use of 'img' and 'src' and the >> implied semantics rather than the mechanism :-). I am an advocate of the >> *mechanism* (e.g >> http://www.vsms.nottingham.ac.uk/vsms/talks/chemwebvei/020.html) where I >> use XML-LINK explicitly to combine chemistry, maths and text. This has the >> advantage that it avoids namespace problems. It also allows me to process >> foreign files if certain assumptions are made. > >I think that your approach works. Do you think that this is the way Thank you. I should perhaps make it clear that the diagram was slightly hypothetical (i.e. not a screenshot from JUMBO. I did at one stage manage to EMBED a molecule in an event stream but it wasn't stable). At present JUMBO will manually deal with linked resources and treat them as separate trees. NEW and REPLACE are easily catered for; EMBED is a problem since it has little meaning in a tree and for text event stream I am still deciding on the best way to arrange flow objects for non-conventional objects (e.g. maths, molecules, name-value pairs, etc.) Also the 'hypertext' support that Java gives is hardly exciting. >to go? I.e., no namespace mechanisms but links only? Or, do you think >that it should be possible to convert the link-based representation to >the namespace-based representation and vice versa? [There is a current SIG/WG discussion on namespaces which I cannot publicly comment on. My private view is that I shall wait-and-see what comes out; from my point of view it's not trivial.] I suspect that namespaces and links will co-exist. I am certainly gently tooling up for each of them. My little experiment with JUMBO-PLAY shows both approaches. (Although only a single namespace is involved, I have prefixed the output of my play.SAXSplit with PLAY:) The advantage of a single monolithic document is it's easier to traverse (e.g. searches). Its disadvantage (for JUMBO) is that it can overflow the JVM. The namespaces are explicitly expanded (i.e. every element name has a namespace prefix). I would find scoping quite difficult until the rules are VERY clear. It is very difficult to build a prototype system if one is not sure what one *should* be doing. (This is distinct from not knowing what one is doing, which is permanent :-). Certainty in the goal makes programming about half an order of magnitude easier. Thus, for example, I don't know 100% whether we shall have prefixes on attributes. Note that one advantage of links is that what is hung on the end need not originally be an XML document. I frequently parse legacy documents into trees on demand. Maybe this could be managed by notation and embedded 'binary', but I don't understand that yet :-) > >Cheers, Cheers P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From parikr at pointcast.com Fri Feb 20 04:22:08 1998 From: parikr at pointcast.com (Parik Rao) Date: Mon Jun 7 17:00:09 2004 Subject: Is anyone using CDATA? Message-ID: <93DA154E07D3D0119C7E006097743AA0F5B40E@hq-exs1.pointcast.com> Anyone have experiences with CDATA ? We're interested in inserting non-XML markup and BLOBs into XML files, and the best way seems to be CDATA. However, some of the parsers I've been playing around with (Microsoft, XMLint) don't support the CDATA element. Is CDATA handling required for a validating parser? For non-XML markup (HTML markup), I could escape the markup and insert it under my own elements, but that requires extra processing and makes documents larger. For BLOBs, obviously pointers to the data rather than embedded could be done. But its can be useful to package all required data into a single file sometimes. Interested in how others are dealing with the situation... -- Parik Rao parikr@pointcast.com PointCast, Inc. http://www.pointcast.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Fri Feb 20 05:11:59 1998 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces" References: <3.0.1.16.19980220010616.44af540a@pop3.demon.co.uk> Message-ID: <34ED106E.C3CFA9EF@allette.com.au> John Petit wrote: > It is true that hand crafted search interfaces would be more polished, > but who should be responsible for their creation. Is there some > designated Java developer in the hotel industry that will make a search > engine selflessly for the entire industry. No. If such work is relegated > to the private companies then such search engines will not represent the > entire industry in a unbiased way. This leaves nice, but proprietary > search engines, and we are right back to where we started from; searches > of privately selected database rather than searches of heterogeneous, > industry representative databases. I wonder what would happen if Alta Vista, Yahoo et al started supporting 'index DTDs' of their own making, written for particular industries and designed as an interface layer to the search engine. The data owners would be responsible for the creation/generation of these very skinny documents and the embedded links to the richer versions. If these DTDs were regarded as being a subset of the data strictly for the purpose of searching (rather than for more general information storage), the DTD would primarily suit the search engine and need show no bias toward any particular industry group. The hits could be ranked more highly than those found by standard means and would probably be more valuable to users. Then the search engine builders could start supporting each others DTDs in search of commercial advantage, etc... This would leave the technical responsibility and potential financial gain to a group who have no other interest other than making data findable. This sounds too good to be true, so almost certainly is. > I do not feel that consumers will not care about search accuracy. When a > customer is looking for variations of Ginkgo Biloba (an over-the-counter > drug) they want to see all the sites that sell it and for what price. > The same is true for travelers looking for room availability at their > travel destinations. No one wants to wade though a hundred tangentially > related sites. I agree. I think users want to feel that buzz that you get from finding the right site on the first try, despite the use of somewhat dubious search criteria. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 07:40:10 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:10 2004 Subject: Is anyone using CDATA? In-Reply-To: <93DA154E07D3D0119C7E006097743AA0F5B40E@hq-exs1.pointcast.c om> Message-ID: <3.0.1.16.19980220064032.1b2f441c@pop3.demon.co.uk> At 20:21 19/02/98 -0800, Parik Rao wrote: >Anyone have experiences with CDATA ? We're interested in inserting >non-XML markup and BLOBs into XML files, and the best way seems to be >CDATA. However, some of the parsers I've been playing around with >(Microsoft, XMLint) don't support the CDATA element. Is CDATA handling >required for a validating parser? An "XML parser" must conform to the XML spec and must therefore *read correctly* a document which includes <![CDATA[ ... ]]> There is no requirement for a parser to DO anything with this other than to report violations of well-formedness (or validity) as appropriate. The SAX API does not support <![CDATA[; i.e. the output has lost any knowledge of what bits were originally CDATA and which were escaped with &, etc. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Feb 20 08:33:18 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:10 2004 Subject: Is anyone using CDATA? Message-ID: <006c01bd3dd9$8048bc90$2ee044c6@donpark> Parik, >Anyone have experiences with CDATA ? We're interested in inserting >non-XML markup and BLOBs into XML files, and the best way seems to be >CDATA. However, some of the parsers I've been playing around with >(Microsoft, XMLint) don't support the CDATA element. Is CDATA handling >required for a validating parser? As far as I know, yes. Version 1.8 of MSXML does handle CDATA sections. I don't know about XMLint. AElfred also supports CDATA. With SAX, you can get CDATA section contents but it will appear as characters. This causes extra processing burden on some conversion applications (i.e. XML to XML) but it is not a serious problem, just a boon for Intel. >For non-XML markup (HTML markup), I could escape the markup and insert >it under my own elements, but that requires extra processing and makes >documents larger. For BLOBs, obviously pointers to the data rather than >embedded could be done. But its can be useful to package all required >data into a single file sometimes. You can compress the HTML markup and write it out with BASE64 encoding. I am in the process of putting together a proposal for embedding binary data in XML documents. It is tentatively named XML-Binary proposal. I will be posting a draft on this mailing list for comments before submitting it to W3C. As far as packaging goes, there is at least one person working on that although I can not go into details due to his request for confidentiality. Perhaps he can elaborate some more. I hope this helps. Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Feb 20 09:48:32 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces" Message-ID: <008401bd3de3$f8c93b90$2ee044c6@donpark> >I agree that this disheartening scenario is quite possible. But what a >shame! It seems that one of XML's major strength's is its ability to >search heterogeneous databases. Independent sellers large and small >would benefit from heterogeneous searches for it would allow super >accurate marketing. Mom and pop producers should be able to sell their >boutique goods to the special set of consumers that would be interested. >A real estate agent in Backwater USA with a unique property should be >able to sell that product in an industry standard search engine. >Without accurate, industry specific search interfaces, consumers will >not easily find these sites. Otherwise we are no better off search wise >than we are today ? wallowing in inaccurate searches. It would be a real >shame if the ultimate promises of XML were hindered by lack of >planning. Laissez-faire is not always the best way. As far as I am concerned, the scenario is not only possible, it is absolutely the only way the history will unfold because content developers will find it hard to convert non-XML contents into XML using standard DTDs. Commercial contents are typically composite data which can not be easily described with a set of standard schemas. Search services makes it even worse because their schema requirement will be far less than that of content providers. Search across heterogeneous databases can still be achieved without asking everyone to put on a straightjacket and wiggleahead at manageable speed for the benefit of mankind. The key lies in dynamic schema conflict resolution technologies. If search service wants the price in pesos and the database stores prices in US dollars, price can be converted by an adapter at the time of demand using currency market datafeed. Currency conversion can not be done beforehand nor cached because its shelf-life is basically counted in minutes. It is also quite unfriendly to return search results with prices in ten different currencies. Also standard DTDs can not adapt to change. What do you do when the standard DTD for electronic devices must be changed to include performance data (i.e. WinMark for Intel machines)? The problems are simply mindboggling (well, my mind is easy to boggle). > It is true that hand crafted search interfaces would be more polished, > but who should be responsible for their creation. Is there some > designated Java developer in the hotel industry that will make a search > engine selflessly for the entire industry. No. If such work is relegated > to the private companies then such search engines will not represent the > entire industry in a unbiased way. This leaves nice, but proprietary > search engines, and we are right back to where we started from; searches > of privately selected database rather than searches of heterogeneous, > industry representative databases. Search companies will attack one industry at a time with the search company providing the custom user interface and dictating what the DTD should be. Each attack will be turned into a press event with announcements of support from major players in that particular industry. These companies will announce that they will provide data using the search company's industry-specific search DTD. Small companies with less resources will provide using push model since they do not have the resources for taking part in distributed search network. Large companies will place more value in their data and will provide information on demand, thus taking part in the search network. >I do not feel that consumers will not care about search accuracy. When a >customer is looking for variations of Ginkgo Biloba (an over-the-counter >drug) they want to see all the sites that sell it and for what price. >The same is true for travelers looking for room availability at their >travel destinations. No one wants to wade though a hundred tangentially >related sites. Without accurate search interfaces, consumers will not >get this sort of accurate response. The RDF is an important part of >describing the web, but I have not seen how it would right way to >address automating search interfaces. This was an exaggeration on my part. I appologize. Have said all this. I still feel that efforts to standardize DTDs must be made and must be maintained for the sake of balance and stability. There wouldn't be much of a market if everyone used their own currency as their picture ID. Prophet for Profit, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 10:28:49 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:10 2004 Subject: rec.xml Message-ID: <3.0.1.16.19980220095415.2247b80e@pop3.demon.co.uk> Here is something bit me unexpectedly, and I'd be interested in comments on it. I *think* I know the answer. I'll leave it to you to think about before you rush for your parsers to check. Using SAX (alone) to parse the XML version of the XML recommendation (rec.xml), is it possible to create a well-formed version? The first time I tried this the result surprised me. P. BTW there may be problems parsing rec.xml as the official version contains a (single) character #160 ( ). This has actually been 'commented out' but parsers such as AElfred don't accept it and throw an error. DavidM assures me that this is the correct thing to do - I take this on trust. So, if you wish to use AElfred on this you'll have to find the #160 (it appears as a aacute; on my editor - yours may vary and even show a 'space'). This has nothing to do with the little amusement above... BTW I have asked if there is a file spec.dtd, and no doubt this will be announced here if/when. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Fri Feb 20 11:49:47 1998 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 17:00:10 2004 Subject: Announcement: New PSGML-XML Additions In-Reply-To: <199802192025.PAA00328@unready.microstar.com> Message-ID: <199802201150.AA00268@murata.apsdc.ksp.fujixerox.co.jp> PSGML-XML works on Meadow (Multilingual enhancement to gnu Emacs with ADvantages Over Windows). Meadow is Emacs20 on MS Windows. It is fully internationalized (but no UTF-16 yet). It was recently released by Miyashita Hisashi <himi@bird.scphys.kyoto-u.ac.jp>. (Miyashita is his family name.) Meadow is available from ftp://ftp.etl.go.jp/pub/mule/Windows/Meadow-1.00-i386.tar.gz INSTALL.Meadow and README.Meadow are written in English. Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Feb 20 12:16:39 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:10 2004 Subject: Is anyone using CDATA? In-Reply-To: <93DA154E07D3D0119C7E006097743AA0F5B40E@hq-exs1.pointcast.com> References: <93DA154E07D3D0119C7E006097743AA0F5B40E@hq-exs1.pointcast.com> Message-ID: <199802201215.HAA00686@unready.microstar.com> Parik Rao writes: > Anyone have experiences with CDATA ? We're interested in inserting > non-XML markup and BLOBs into XML files, and the best way seems to be > CDATA. However, some of the parsers I've been playing around with > (Microsoft, XMLint) don't support the CDATA element. Is CDATA handling > required for a validating parser? I, at least, cannot reproduce your bug -- with MSXML, the following document parses exactly as expected: <?xml version="1.0"?> <listing> <![CDATA[<a></a>]]> </listing> That said, CDATA marked sections won't always work for you -- BLOBs are likely to contain non-SGML characters, and any arbitrary non-XML markup containing ']]>' will kill the marked section. The best way to include arbitrary non-XML information in a document is to include it as an unparsed entity or an HREF link (just as you would include a GIF in an HTML page). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Feb 20 12:29:54 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:10 2004 Subject: rec.xml In-Reply-To: <3.0.1.16.19980220095415.2247b80e@pop3.demon.co.uk> References: <3.0.1.16.19980220095415.2247b80e@pop3.demon.co.uk> Message-ID: <199802201228.HAA00784@unready.microstar.com> Peter Murray-Rust writes: > Using SAX (alone) to parse the XML version of the XML > recommendation (rec.xml), is it possible to create a well-formed > version? The first time I tried this the result surprised me. James Clark has created the Java application XMLTest to do exactly this: http://www.jclark.com/xml/XMLTest.java I just normalised the REC with the following command line: java XMLTest com.microstar.sax.AElfredDriver /tmp REC-xml-19980210.xml It seems to have come out fine (though without XML declaration, comments, DOCTYPE, etc.). The purpose of James's application is to allow easy comparisons of different SAX drivers and parsers. > BTW there may be problems parsing rec.xml as the official version > contains a (single) character #160 ( ). The problem has been fixed in the REC. Parsing the REC no longer causes problems for AElfred because the REC's XML declaration declares the encoding as "ISO-8859-1", where #160 is a legal character. The problem is that not all XML parsers allow the declared encoding ISO-8859-1 (though that's what most of them really support). > This has actually been 'commented out' but parsers such as AElfred > don't accept it and throw an error. DavidM assures me that this is > the correct thing to do - I take this on trust. This is _a_ correct thing to do. This is an error but not a fatal error, so it is up to the parser whether or not to report it. That said, any parser with actual UTF-8 support will somehow choke on #160 if it thinks it's parsing UTF-8. Right now, most parsers claim to be parsing UTF-8 when they're really parsing ISO-8859-1, hence they don't choke on #160. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From matthewg at poet.de Fri Feb 20 13:12:54 1998 From: matthewg at poet.de (Matthew Gertner) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces" Message-ID: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com> Don, <snip> > Also standard DTDs can not adapt to change. What do you do when the > standard DTD for electronic devices must be changed to include performance > data (i.e. WinMark for Intel machines)? The problems are simply > mindboggling (well, my mind is easy to boggle). One approach that really appeals to me is based on a two-pronged effort to create standard tags *and* standard DTDs, and relies on the fact that there is really a working mechanism for extending DTDs through inheritance (which I guess is still not entirely the case). Standard tags would be a bit of a hack, but probably very useful in a pragmatic sense. For example, you might be able to say certain things about a TITLE tag, or a PRICE tag, or whatever, just on the basis of the name, regardless of the actual DTD being used. If these conventions were well-known, this could be of great use when defining a new DTD (i.e. "Let's call the tag PARAGRAPH and not PARA because this is what will be recognized by search engines"). Inheritance is *not* a hack and really seems like the way to go for more ambitious implementations. To take your example, the DTD for electronic devices might contain tags for VENDOR, PRODUCTNAME, PRICE, CATEGORY, etc. If I want to find all CD player devices from Sony that cost less than $99 then I can query based on this standard DTD. Vendors who want to include more information just derive a new DTD with all the standard tags, as well as vendor-specific ones (for benchmark figures, for example). The non-standard tags may not be available for querying, but the information in the standardized base DTD would be. This becomes even more powerful with multiple inheritance. I can whip up a DTD for my new portable XML viewer/expresso brewer, imported from Kazakhstan, just by grapping the standard DTDs for hand-held electronic devices (derived from general electronic devices but adding tags for SIZE, WEIGHT and BATTERYLIFE), for food processing equipment (also derived from electronic devices but a tag for FOODTYPE) and for imported goods (with tags for COUNTRYOFORIGIN, EXPORTTARIF, etc.). This would let users find my product by querying for all portable devices weighing under 200 grams which can process coffee and which are produced in Central Asia. I really believe the world needs XML to get a grip on information explosion. The approach suggested by the original poster is great, and with plug-and-play DTDs I don't see any real technical reason why it shouldn't work. As an initial implementation, the approach based on GI only would no doubt be a good workaround. Matthew xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Fri Feb 20 16:03:37 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:10 2004 Subject: Is anyone using CDATA? Message-ID: <01bd3e19$4288ad80$1e09e391@mhklaptop.bra01.icl.co.uk> >Anyone have experiences with CDATA ? We're interested in inserting >non-XML markup and BLOBs into XML files, and the best way seems to be >CDATA. I don't think CDATA is useful for inserting binary data into XML files, because there is no way of escaping the terminating "]]>". I think the best way to do it, if you want to do it inline, is to use Base64 encoding, and then you don't need CDATA. Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Feb 20 19:44:44 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces" In-Reply-To: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com> References: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com> Message-ID: <199802201535.KAA00874@unready.microstar.com> Matthew Gertner writes: > One approach that really appeals to me is based on a two-pronged effort to > create standard tags *and* standard DTDs, and relies on the fact that there > is really a working mechanism for extending DTDs through inheritance (which > I guess is still not entirely the case). > > Standard tags would be a bit of a hack, but probably very useful in a > pragmatic sense. For example, you might be able to say certain things about > a TITLE tag, or a PRICE tag, or whatever, just on the basis of the name, > regardless of the actual DTD being used. If these conventions were > well-known, this could be of great use when defining a new DTD (i.e. "Let's > call the tag PARAGRAPH and not PARA because this is what will be recognized > by search engines"). The idea is actually quite sound, but the implementation could be a little cleaner. Instead of relying on the element type name (which may vary for different domains of information), why not have a standard attribute (such as 'standard-doc') that gives the equivalent standard name in the architecture. That way, just as you write public class Cost implements Price { } in Java, you can write <!ELEMENT cost (#PCDATA)> <!ATTLIST cost standard-doc CDATA #FIXED "price"> in XML, or even <cost standard-doc="price">xxx</foo> This makes multiple inheritance easy: <!ELEMENT cost (#PCDATA)> <!ATTLIST cost standard-doc CDATA #FIXED "price" alt-doc CDATA #FIXED "value"> Now, that `cost' inherits from `price' in the standard-doc architecture and from `value' in the alt-doc architecture. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mwagner at ets.org Fri Feb 20 20:33:47 1998 From: mwagner at ets.org (Mike Wagner) Date: Mon Jun 7 17:00:10 2004 Subject: MS XML Parser on the Server Message-ID: <v04003a04b11397b5aa53@[144.81.30.117]> Has anybody managed to get the Microsoft Java XML Parser running as a component accessible by ASP under IIS? I tried what seemed to me to be the obvious approach and that didn't work. I copied the java classes to the TrustLib directory, then registered them with javareg. (An excerpt of the BAT I used file is at the end of this message). However, when I try a simple Server.CreateObject("com.ms.xml.om.Document") call in an ASP page, it dies with the following error: Microsoft JScript runtime error '800a01ad' Automation server can't create object /xmltest.asp, line 14 Any insights? Thanks. Mike Wagner Educational Testing Service mwagner@ets.org -----------------Javareg BAT file-------------------- cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:SchemaNode /progid:com.ms.xml.dso.SchemaNode cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLParserThread /progid:com.ms.xml.dso.XMLParserThread cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLRowsetProvider /progid:com.ms.xml.dso.XMLRowsetProvider cd \winnt\java\trustlib\com\ms\xml\om xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From pierlou at CAM.ORG Fri Feb 20 21:04:15 1998 From: pierlou at CAM.ORG (Pierre Morel) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces Message-ID: <01bd3e41$861d1090$02dcdcdc@pierre> Hello, I would like to talk about the location of the person making the search versus the location of the product or service provider. If I search for a product and I want it now, I only want a list of provider in a distance applicable for my request. And if I go to Europe this summer and want to make reservation or search for activities occuring at that time, the 'where I am' specification change. If I have a secondary house and make request on the week-end, I want the restaurant in that region and not the one near my primary house. An identity profile should be include in the query and give the chance to the search engine to make a better choice in regard of my age, sex, etc... Another part of the problem is a unique number identification and I am not sure if EAN or SIC is good for that purpose. How a search engine can parse a site or made a request for a product or service without a unique product number. A hotel room is a 'chambre' in french. If I search for a hotel room in Italy, I don't know the word for room in italian but if a room is a number, I can search for a room every where in the world. The query interface will be in my language and the service provider will build his database in his own language. The query page should change for every product. I have work around this idea for a time and came to the conclusion that a lightweight page creation and manipulation is need. The small tutorial that show how the parts fit together is related to a very premature search engine. The left pane show the products in a store but can be a list of products at a search engine site. What is XML-Data versus DTD ? Maybe the solution is there and I don't see it. I would like to know if every product on earth can have a number the same way that every book can be codified ? Best regards to all Pierre Morel pierlou@cam.org http://www.cam.org/~pierlou/prototype -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980220/147dbc11/attachment.htm From donpark at quake.net Fri Feb 20 22:39:36 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:10 2004 Subject: TagNet (was Automating Search Interfaces) Message-ID: <009301bd3e4f$b4e336d0$2ee044c6@donpark> Matthew, >One approach that really appeals to me is based on a two-pronged effort to >create standard tags *and* standard DTDs, and relies on the fact that there >is really a working mechanism for extending DTDs through inheritance (which >I guess is still not entirely the case). I think the efforts will be best spent by building a sort of WordNet like service which allow automatic registration and association of tag and attribute names. For example, book vendor could register TITLE as a tag name and associate it with NAME as a synonym constrained by the book industry code (if there is such a thing). Search service can then see that the contents offered by the book vendor can be searched by mapping its NAME field to TITLE tag. Inheritance relationship can also be registered and taken advantage of by search services. It probably won't have to be a full semantic network but it will require a standard API. I wish it could capture whole/part relationships as well like (NAME == FIRST + MIDDLE + LAST) but I could be going overboard here. Some of the entries can be marked as the 'norm' by some standardization organizations. A DTD writer could just build what he wants and then pass it through the service to change all names to the 'norm'. For the benefit of those replying to this message, let me call the service TagNet. "What do you want to tag today?;-)" Feeling great today, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Feb 20 22:39:40 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:10 2004 Subject: Automating Search Interfaces Message-ID: <009401bd3e4f$b5d7a8f0$2ee044c6@donpark> Pierre, >I would like to talk about the location of the person making the search versus the location of the product or service provider. If I search for a product and I want it now, I only want a list of provider in a distance applicable for my request. And if I go to Europe this summer and want to make reservation or search for activities occuring at that time, the 'where I am' specification change. If I have a secondary house and make request on the week-end, I want the restaurant in that region and not the one near my primary house. An identity profile should be include in the query and give the chance to the search engine to make a better choice in regard of my age, sex, etc... Interesting. Some of the issues with product location are: 1. How to indicate location? Address or map coordinates? How does one find map coordinates? What happens when he moves? 2. How to associate location with products? If a vendor has all inventory at a single location then the location can be #FIXED in his DTD. If inventory is distributed around the globe, each product or inventory group will have to be marked. The problem is that now it makes no sense to indicate physical location. It will have to be a store code which causes problem with search services since store codes will have to be converted into location format used by the search service. As far as time constraints go, each product will probably be marked with time. The problem is that some time constraints are relative in nature. *Ouch* I just thought of another painful problem with prices. What happens when a store wants to put on a sale? His database of products will have to map to different pricing schemes constrained by time, location, or association. All this hurts my head a bit but it is very interesting indeed... Regards, Don Park http://www.quake.net/~donpark/index.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980220/5f733d95/attachment.htm From mike at jmaca.com Fri Feb 20 23:01:53 1998 From: mike at jmaca.com (Michael Emmel) Date: Mon Jun 7 17:00:10 2004 Subject: Binary Data Message-ID: <34EE0D80.BBA53DC9@jmaca.com> Is it possible to include binary data in a XML document and follow the spec. <![CDATA[ ascii data ]]> allows the inclusion of arbitrary ascii data except I do not think uuencode or other binary -> ascii/UTF8 encoders will work without modification to eliminate the ]]> encoding. Would this be possible. <![BDATA length=1024[ binary data ]]> where the parser would ignore 1024 bytes and expect to see a ]]> at the end. The spec seems to imply only character data but does not disallow binary data. I assume a character encoding that did not use the ]]> sequence is okay. I think the <![BDATA length=x[ ]]> tag is not. You need let the the parser ignore and redirect x number of bytes from the token stream. This would be equivalent to a "Java production" in Javacc. But I'm not sure if it is legal ??? So do I need to alter uuencode or some other encoding format to fit the <CDATA format or is it legal to include a binary section. And if not why not : ) Mike mike@jmaca.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 23:03:16 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:10 2004 Subject: LISTRIVIA In-Reply-To: <009401bd3e4f$b5d7a8f0$2ee044c6@donpark> Message-ID: <3.0.1.16.19980220224802.35d7dc7c@pop3.demon.co.uk> At 14:33 20/02/98 -0800, Don Park wrote: [...] > >Attachment Converted: "c:\eudora\attach\ReAutoma.htm" ^^^^^^^^^^^^ This is the sort of problem with attachments... P. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 23:27:09 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:10 2004 Subject: LISTRIVIA In-Reply-To: <01bd3e41$861d1090$02dcdcdc@pierre> Message-ID: <3.0.1.16.19980220224826.35d7f436@pop3.demon.co.uk> Hi Pierre, thanks for the posting... At 15:52 20/02/98 -0500, Pierre Morel wrote: > >Attachment Converted: "c:\eudora\attach\Automati.htm" ^^^^^^^^^^ We ask people not to post attachments to xml-dev, because they don't get hypermailed and they take up space on readers' machines :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Feb 20 23:39:55 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:10 2004 Subject: xml:space Message-ID: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> I am considering how to treat xml:space in JUMBO and ask for help and comments. <NOTE>I am NOT re-opening the whitespace debate; I am asking those who understand xml:space if what I do/intend to do is reasonable. xml:space is a formal part of the language and I feel I have to address it.</NOTE> 1. Are there any documents which actually use xml:space? rec.xml does not 2. Is there anyone on this list intending to use it? If so, what do they expect "applications' default white-space processing modes" to be? [Quotations are from rec.xml] >An XML processor must always pass all characters in a document that are not >markup through to the application. A validating XML >processor must also inform the application which of these characters >constitute white space appearing in element content. My philosophy in JUMBO (which is a generic application) is to accept all whitespace from the parser/SAX, whether labelled 'ignorable' or not. All PCDATA is stored in child nodes of elements. Those with ignorable whitespace can be specially labelled. IOW I do not discard any character data on input. > >A special attribute named xml:space may be attached to an element to signal >an intention that in that element, white space should be >preserved by applications. In valid documents, this attribute, like any >other, must be declared if it is used. When declared, it must be >given as an enumerated type whose only possible values are "default" and >"preserve". For example: > > <!ATTLIST poem xml:space (default|preserve) 'preserve'> > OK. If xml:space="preserve" I have no problems. If xml:space="default" I am asking for help. Note that xml:space="default" could apply either to ignorable whitespace or non-ignorable w/s If xml:space is absent, I suggest options below... > >The value "default" signals that applications' default white-space >processing modes are acceptable for this element; the value >"preserve" indicates the intent that applications preserve all the white >space. This declared intent is considered to apply to all >elements within the content of the element where it is specified, unless This causes me slight concern. It means I have to write code that automatically tracks what elements have an xml:space attribute. This is possible, but yet another thing that has to be done. I might be motivated to do it if I am shown some use for it... >overriden with another instance of the xml:space attribute. This means effectively that every node in a document has to have an xml:space flag. [Unless this is dynamically worked out every time the document is to be rendered.] -------- Without xml:space, and without a DTD, I can see the following *generic* possibilities: - element is empty. [BTW the spec (and SAX) discards all knowledge of whether this was created by <FOO></FOO> or <FOO/>. I approve of this.]. Children are not displayed because there aren't any - element contains non-w/s characters. This is displayed as either as a string or as a title-value pair (at user option). The title is determined by simple heuristics. - element contains element content. This is displayed as a tree. I am considering also allowing the user to display this as a tagged/untagged event stream, but the tree is the default. - element contains element content and (some) non-w/s PCDATA children . This is displayed as an untagged (or selectable) tagged event stream. Unless the semantics of the tags are known or a stylesheet is provided, no other rendering is possible. Now the two w/s options... - element contains element content and (only) w/s children. This is displayed by default as ignoring the w/s. Note that this is *display*, not processing. Since the default is a tree, the w/s nodes aren't much use. - element contains a single w/s child. This does not display anything by default. The user can switch to display/hide PCDATA children in the tree display. For *outputting* it is possible to delete the w/s nodes if required. Once deleted they are gone ... I would be interested in comments as to whether this is reasonable default behaviour or whether there are other things that should be considered. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Feb 21 00:24:39 1998 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 17:00:10 2004 Subject: xml:space Message-ID: <3.0.32.19980220162318.00acf700@pop.intergate.bc.ca> At 10:34 PM 2/20/98, Peter Murray-Rust wrote: A short answer: yes, if you want to respect xml:space, you have really no choice but to keep a stack or suchlike to see if it's been overriden in a child element. JUMBO, since it's an application, has no obligation to respect xml:space, it's just a request, after all. If you are respecting xml:space, whenever you are in an element for which xml:space='preserve' does not apply, you should do whatever best suits the needs of your application and its users. I very much doubt there is a universal answer for all classes of application. I think HTML gets it pretty much right for display type applications. As for your question "will it be used?": yes, of course. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sat Feb 21 02:44:14 1998 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Message-ID: <002201bd3e72$c6880a00$9d0b4ccb@NT.JELLIFFE.COM.AU> From: Michael Emmel <mike@jmaca.com> >Is it possible to include binary data in a XML document and follow the >spec. It is possible to have binary data in an XML *document* but it is not possible to have (unencoded) binary data in an XML text *entity*. A document is constructed from entities. An entity is usually a file. An entity is either text or binary (NDATA) but not both. You can use Base64 encoding to stick non-text data inside elements: <!DOCTYPE foo [ <!NOTATION base64 SYSTEM "put URL of base 64 code here, or omit this string" ... ]> <foo> ... <binary-data notation="base64">...</binary-data> ... </foo> CDATA marked sections are only a shorthand mechanism for data which has a lot of "&" or "<" characters which you might find tedious to delimit into entity references. It is not a mechanism for embedding raw binary, per se. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From JimL at Alphag.net Sat Feb 21 23:39:03 1998 From: JimL at Alphag.net (Jim Lears) Date: Mon Jun 7 17:00:11 2004 Subject: MS XML Parser on the Server Message-ID: <D00028C46B33D111BABF00A0249C714B09DD82@ROMULUS> Server.CreateObject in VBScript is used for creating instances of COM objects. The Java XML Parser doesn't expose any COM interfaces...notably IClassFactory which is used to instantiate COM objects. The C++ version is what you need...its an ActiveX control. The source code for both parsers is available. If you insist on using the Java version, you could mod it up to sport a COM interface.. Helping To Destroy The English Language -----Original Message----- From: Mike Wagner [SMTP:mwagner@ets.org] Sent: Friday, February 20, 1998 3:33 PM To: xml-dev@ic.ac.uk Subject: MS XML Parser on the Server Has anybody managed to get the Microsoft Java XML Parser running as a component accessible by ASP under IIS? I tried what seemed to me to be the obvious approach and that didn't work. I copied the java classes to the TrustLib directory, then registered them with javareg. (An excerpt of the BAT I used file is at the end of this message). However, when I try a simple Server.CreateObject("com.ms.xml.om.Document") call in an ASP page, it dies with the following error: Microsoft JScript runtime error '800a01ad' Automation server can't create object /xmltest.asp, line 14 Any insights? Thanks. Mike Wagner Educational Testing Service mwagner@ets.org -----------------Javareg BAT file-------------------- cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:SchemaNode /progid:com.ms.xml.dso.SchemaNode cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLParserThread /progid:com.ms.xml.dso.XMLParserThread cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLRowsetProvider /progid:com.ms.xml.dso.XMLRowsetProvider cd \winnt\java\trustlib\com\ms\xml\om xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at datachannel.com Sun Feb 22 00:05:46 1998 From: mike at datachannel.com (Mike Dierken) Date: Mon Jun 7 17:00:11 2004 Subject: MS XML Parser on the Server Message-ID: <01BD3EE2.0FB98770@NEMO> On the MS platform, you can expose all your Java classes and interfaces as COM interfaces if you use the ActiveX Wizard for Java (JAVAIDL.EXE). It'll create an .IDL file (and .C and .H files if you want to call the interfaces from C/C++). All Java classes are exposed asl dual interfaces, derived from IDispatch, which allows them to be called from all COM aware scripting languages (JavaScript, VB for Automation, etc). If the Java classes are registered with Javareg (using the CLSIDs from the generated .IDL file) on the server, you can use the package name rather than a CLSID. To create a Java object, you might try prepending 'java:' on the package name. Server.CreateObject("java:com.ms.xml.om.Document") Hope this helps... Mike D DataChannel -----Original Message----- From: Jim Lears [SMTP:JimL@Alphag.net] Sent: Saturday, February 21, 1998 3:36 PM To: xml-dev@ic.ac.uk Subject: RE: MS XML Parser on the Server Server.CreateObject in VBScript is used for creating instances of COM objects. The Java XML Parser doesn't expose any COM interfaces...notably IClassFactory which is used to instantiate COM objects. The C++ version is what you need...its an ActiveX control. The source code for both parsers is available. If you insist on using the Java version, you could mod it up to sport a COM interface.. Helping To Destroy The English Language -----Original Message----- From: Mike Wagner [SMTP:mwagner@ets.org] Sent: Friday, February 20, 1998 3:33 PM To: xml-dev@ic.ac.uk Subject: MS XML Parser on the Server Has anybody managed to get the Microsoft Java XML Parser running as a component accessible by ASP under IIS? I tried what seemed to me to be the obvious approach and that didn't work. I copied the java classes to the TrustLib directory, then registered them with javareg. (An excerpt of the BAT I used file is at the end of this message). However, when I try a simple Server.CreateObject("com.ms.xml.om.Document") call in an ASP page, it dies with the following error: Microsoft JScript runtime error '800a01ad' Automation server can't create object /xmltest.asp, line 14 Any insights? Thanks. Mike Wagner Educational Testing Service mwagner@ets.org -----------------Javareg BAT file-------------------- cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:SchemaNode /progid:com.ms.xml.dso.SchemaNode cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLParserThread /progid:com.ms.xml.dso.XMLParserThread cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLRowsetProvider /progid:com.ms.xml.dso.XMLRowsetProvider cd \winnt\java\trustlib\com\ms\xml\om xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From k_coffin at conknet.com Sun Feb 22 02:58:22 1998 From: k_coffin at conknet.com (Kerry Coffin) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Message-ID: <01bd3f3d$977564d0$ed0620ce@lbynum.esri.com> What is Base64? Regards, Kerry Coffin Environmental Systems Research Institute (ESRI) -----Original Message----- From: Rick Jelliffe <ricko@allette.com.au> To: Michael Emmel <mike@jmaca.com>; xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Friday, February 20, 1998 9:44 PM Subject: Re: Binary Data > > >From: Michael Emmel <mike@jmaca.com> > > > >>Is it possible to include binary data in a XML document and follow the >>spec. > > >It is possible to have binary data in an XML *document* but it is not >possible >to have (unencoded) binary data in an XML text *entity*. A document is >constructed from entities. An entity is usually a file. An entity is either >text >or binary (NDATA) but not both. > >You can use Base64 encoding to stick non-text data inside elements: > ><!DOCTYPE foo [ ><!NOTATION base64 SYSTEM "put URL of base 64 code here, or omit this string" >... >]> ><foo> >... ><binary-data notation="base64">...</binary-data> >... ></foo> > > >CDATA marked sections are only a shorthand mechanism for data which has a >lot of >"&" or "<" characters which you might find tedious to delimit into entity >references. >It is not a mechanism for embedding raw binary, per se. > >Rick Jelliffe > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Feb 22 10:57:37 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:11 2004 Subject: LISTRIVIA In-Reply-To: <01BD3EE2.0FB98770@NEMO> Message-ID: <3.0.1.16.19980222103123.1c3f3e98@pop3.demon.co.uk> At 16:02 21/02/98 -0800, [a number of posters in combination] wrote: [A message] > >-----Original Message----- [which quoted another message in full] > > -----Original Message----- [which itself quoted another message in full] [and finished with cascading xml-dev backmatter]. and in another message a simple question was asked followed by cascading quoted messages which added no value. ---------------------------------------------------------------------- Since new members are continually joining the list - and we welcome them :-) - , I'll reiterate our policy for minimising the amount of material posted. Remember that: - many people pay personal money for mail (including me) - duplicated material is excessively tedious on the hypermail list and takes up valuable space - duplication takes up space on reader's local storage. - automatic quoting is not a good approach towards managing information. XML encourages people to normalise material as much as possible. Please therefore excise all material that you don't directly refer to in your message. Most people prefer to see the quoted material followed by the annotation rather than the annotation followed by the original message. Remember that the material is all hypermailed and publicly visible and (optionally) available as a digest. Both of these should be attractive to read. :-) For more details and suggestions of other styles to adopt/avoid, you may wish to follow the various LISTRIVIA threads. These also comment on multiple copies of postings :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Feb 22 21:05:58 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Message-ID: <003101bd3fd4$f5c83fc0$2ee044c6@donpark> BASE64 is MIME content tranfer encoding algorithm defined in RFC 2045. It is used to map binary data into a range of characters. Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Mon Feb 23 00:50:34 1998 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Message-ID: <3.0.32.19980222164918.00b68370@pop.intergate.bc.ca> At 12:59 PM 2/22/98 -0800, Don Park wrote: >BASE64 is MIME content tranfer encoding algorithm defined in RFC 2045. It >is used to map binary data into a range of characters. What's real important from the XML point of view is that (unless my memory fails me) base64 has the nice property that it uses a very restricted range of characters, which happens not to include < or &, and thus can be tossed into an XML doc just about anywhere without breaking anything. I think a predefined base64 notation attribute is a no-brainer good idea, so obvious that it can't be new. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From b.laforge at opengroup.org Mon Feb 23 00:59:49 1998 From: b.laforge at opengroup.org (Bill la Forge) Date: Mon Jun 7 17:00:11 2004 Subject: xml-based protocol Message-ID: <3.0.32.19980222200447.00a05330@postman.osf.org> Finally, AXTP is using xml for the wire protocol. (I've also created some documentation.) AXTP: Application eXtensible Transactional Protocol (UDP based) http://www.camb.opengroup.org/~laforge/axtp/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Feb 23 03:14:26 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:11 2004 Subject: SAX: finalising org.sax.xml.Parser Message-ID: <199802230313.WAA00386@unready.microstar.com> It's time to finalise SAX before there is such a big code base that we can no longer make changes. (Thanks, by the way, to James Clark, DataChannel, and IBM for including native SAX support in their XML parsers). During this phase, I'd like to make the _minimum_ changes necessary SAX to define a consistent and simple common functionality for XML parsers. Let's start with the Parser interface. I'll use Java syntax because, while I can read IDL, I don't trust myself to write it: [current interface] ------------------------------------------------------------------------ package org.xml.sax; public interface Parser { public void setEntityHandler (EntityHandler handler); public void setDocumentHandler (DocumentHandler handler); public void setErrorHandler (ErrorHandler handler); public void parse (String publicID, String systemID) throws java.lang.Exception; } ------------------------------------------------------------------------ After considering the various discussions over the past few weeks, I propose that we make the following changes: 1) Add a parse() method that accepts a stream. 2) Add a parse() method that accepts a character buffer. 3) Remove public ID from the current parse() method (I don't think public IDs are going anywhere fast in XML). With these changes, the interface would look like this in Java: [proposed changes] ------------------------------------------------------------------------ package org.xml.sax; import java.io.InputStream; public interface Parser { public void setEntityHandler (EntityHandler handler); public void setDocumentHandler (DocumentHandler handler); public void setErrorHandler (ErrorHandler handler); public void parse (String uri) throws java.lang.Exception; public void parse (InputStream is, String baseURI) throws java.lang.Exception; public void parse (char ch[], int start, int length, String baseURI) throws java.lang.Exception; } ------------------------------------------------------------------------ NOTES: a. The baseURI argument is necessary for streams and character buffers in case either contains a relative URI. You can supply a null value if the document entity will not contain relative URIs. b. All programming languages initially targeted by SAX (Java, C++, C, Perl) have some concept of input streams; if we come up against one that doesn't, it can simply omit the relevant method. c. The start and length arguments are necessary with the character buffer in case the XML document is part of a larger array. Does this give reasonable functionality without limiting the architectural approaches of parser writers? Remember that individual implementations can extend this interface, but the interface represents the minimum common functionality that every SAX-conformant parser (eventually) provides. Thanks, and all the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zmin at iti.gov.sg Mon Feb 23 05:08:40 1998 From: zmin at iti.gov.sg (Dr. Zheng Min) Date: Mon Jun 7 17:00:11 2004 Subject: Making COM componts from java MSXML (Was: MS XML Parser on the Server) Message-ID: <01bd4019$733f24c0$96897ac0@zhengmin.iti.gov.sg> A few questions about making java MSXML COM aware: 1. Mike suggested using ActiveX Wizard for Java to create .IDL file. Has anyone done it successfully? I tried it just now but a lot of method were skipped because of non-translatible type (why is that? Does it mean those methods can't be used in COM interface?). 2. Even worse, I can't re-compile MSXML in J++. I stuck in the first file -- com.ms.xml.dso.XMLDSO.java. The error messages are all in the same type: Value for argument 'parent' cannot be converted from 'int' in call to 'Element ElementFactory.createElement(Element parent, int type, Name tag, String text)' The statement in XMLDSO.java is: e = factory.createElement(Element.ELEMENT, XMLRowsetProvider.nameROWSET); It doesn't look right but I don't know how MS can make *.class from it (or I missed something?). Has anyone tried to recompile it and succeeded. Thank, Min -----Original Message----- From: Mike Dierken <mike@datachannel.com> To: 'Jim Lears' <JimL@Alphag.net>; xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Sunday, February 22, 1998 8:03 AM Subject: RE: MS XML Parser on the Server >On the MS platform, you can expose all your Java classes and interfaces as COM interfaces if you use the ActiveX Wizard for Java (JAVAIDL.EXE). It'll create an .IDL file (and .C and .H files if you want to call the interfaces from C/C++). >All Java classes are exposed asl dual interfaces, derived from IDispatch, which allows them to be called from all COM aware scripting languages (JavaScript, VB for Automation, etc). > >If the Java classes are registered with Javareg (using the CLSIDs from the generated .IDL file) on the server, you can use the package name rather than a CLSID. >To create a Java object, you might try prepending 'java:' on the package name. > Server.CreateObject("java:com.ms.xml.om.Document") > >Hope this helps... > >Mike D >DataChannel > >-----Original Message----- >From: Jim Lears [SMTP:JimL@Alphag.net] >Sent: Saturday, February 21, 1998 3:36 PM >To: xml-dev@ic.ac.uk >Subject: RE: MS XML Parser on the Server > >Server.CreateObject in VBScript is used for creating instances of COM >objects. The Java XML Parser doesn't expose any COM interfaces...notably >IClassFactory which is used to instantiate COM objects. The C++ version >is what you need...its an ActiveX control. The source code for both >parsers is available. If you insist on using the Java version, you could >mod it up to sport a COM interface.. > > >Helping To Destroy The English Language > > -----Original Message----- > From: Mike Wagner [SMTP:mwagner@ets.org] > Sent: Friday, February 20, 1998 3:33 PM > To: xml-dev@ic.ac.uk > Subject: MS XML Parser on the Server > > Has anybody managed to get the Microsoft Java XML Parser running >as a > component accessible by ASP under IIS? I tried what seemed to me >to be the > obvious approach and that didn't work. I copied the java classes >to the > TrustLib directory, then registered them with javareg. (An >excerpt of the > BAT I used file is at the end of this message). However, when I >try a > simple Server.CreateObject("com.ms.xml.om.Document") call in an >ASP page, > it dies with the following error: > > Microsoft JScript runtime error '800a01ad' > > Automation server can't create object > > /xmltest.asp, line 14 > > Any insights? Thanks. > > Mike Wagner > Educational Testing Service > mwagner@ets.org > > -----------------Javareg BAT file-------------------- > cd \winnt\java\trustlib\com\ms\xml\dso > javareg /register /class:SchemaNode >/progid:com.ms.xml.dso.SchemaNode > cd \winnt\java\trustlib\com\ms\xml\dso > javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO > cd \winnt\java\trustlib\com\ms\xml\dso > javareg /register /class:XMLParserThread > /progid:com.ms.xml.dso.XMLParserThread > cd \winnt\java\trustlib\com\ms\xml\dso > javareg /register /class:XMLRowsetProvider > /progid:com.ms.xml.dso.XMLRowsetProvider > cd \winnt\java\trustlib\com\ms\xml\om > > > > xml-dev: A list for W3C XML Developers. To post, >mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following >message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the >following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Mon Feb 23 11:25:19 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:11 2004 Subject: MS XML Parser on the Server Message-ID: <01bd404d$ad63cbe0$1e09e391@mhklaptop.bra01.icl.co.uk> >Has anybody managed to get the Microsoft Java XML Parser running as a >component accessible by ASP under IIS? I tried and failed, probably because I was doing it wrong; then I rewrote my app using SAX (over AElfred) and have this working under ASP fine. I tried first using Javasoft's ActiveX Bridge which I couldn't get to work except for the most trivial single-class javabeans; then I tried using javareg and got it working - at least once I had worked out how to ensure that the class path setting for the Microsoft Java VM was right. I found it useful to test the thing with a little VB app as the environment is more controllable. I found it necessary to pay some attention to exception handling: if you don't catch the things, they have a habit of crashing the ActiveX container, i.e. the web server. To keep things simple, I wrote a simple wrapper class for my application which exposed all the interfaces I needed in the ASP script and nothing else, and it was this wrapper class that I registered using javareg. The underlying Java classes, so long as they are on the classpath, do not need to be registered. My javareg call was javareg /register /class:com.icl.saxon.showXML /progid:ShowXML.Java and the CreateDocument (in VBScript) was: Set app = CreateObject("ShowXML.Java") I haven't tried calling back from the Java code to ActiveX objects (e.g. calling Response.Write) but it should work in theory. Instead I put the output in String variables which the ASP page retrieves explicitly using methods on ShowXML. Not elegant, but I was deliberately minimising the number of things that might go wrong. I also haven't tried anything complicated with collections or enumerations. Hope that helps, Mike Kay, ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Mon Feb 23 11:48:07 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:11 2004 Subject: Automating Search Interfaces Message-ID: <01bd4050$b9604f60$1e09e391@mhklaptop.bra01.icl.co.uk> >I would like to talk about the location of the person making the search versus >the location of the product or service provider Geographic/Spatial queries are a well-researched topic in the database literature. Free text retrieval is definitely a weak approach, though people attempt it by using thesaurus facilities to represent the structure of a gazetteer. In most of the practical systems I have seen, spatial query is done using postal codes: the system needs knowledge of which postal districts are near each other. (We also use such techniques for scheduling the itinerary of service engineers). >A hotel room is a 'chambre' in french. If I search for a hotel room in Italy, I>don't know the word for room in italian... Multilingual search is well researched and seems to work reasonably well. The more difficult problem is to distinguish agencies that can book you a hotel room from newsletter articles by people enthusing what a wonderful hotel room they were staying in: I think this is why there will always be added value in manual categorization and indexing services. Mike Kay -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980223/d9793ec6/attachment.htm From hb at ix.heise.de Mon Feb 23 13:01:46 1998 From: hb at ix.heise.de (Henning Behme) Date: Mon Jun 7 17:00:11 2004 Subject: Ad: small app + article (XML/DSSSL). References: <3.0.1.16.19980214135027.63dfb77c@pop3.demon.co.uk> Message-ID: <34F172D9.15712E4A@ix.heise.de> Hi, we (iX Magazine in Germany) have put an article online (in German, though - I'll try to provide an English version asap) that introduces a small XML application and shows how its data is being converted into HTML using James Clark's Jade. The app is a tiny attempt to display literary history in terms of authors (when born &c.) and explains two DSSSl style sheets which a) show the toc and b) list details of a (chosen) author. Those of you who read German may try (if interested :-) http://www.heise.de/ix/artikel/1998/03/156/ The app itself is online, too (toc and single author by now; I am working on century-oriented lists and the like) http://www.heise.de/ix/raven/Web/xml/lit toc is static, author is done on the fly using Jade. I thought it would be better this way than to generate all the files for the authors, although this, of course, means waiting for a short while :-) Best regards, hb -- Henning Behme iX - Magazin fuer professionelle Informationstechnik Helstorfer Str. 7 * 30625 Hannover * Germany http://www.heise.de/ix/ * +49 511 5352-374 * -361 (Fax) ------ White, adj. and n. Black (Ambrose Bierce) ------ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Mon Feb 23 14:48:18 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:11 2004 Subject: finalising org.sax.xml.Parser Message-ID: <c=CA%a=_%p=JetForm%l=ROSSINI-980223144227Z-20917@rossini.jetform.com> David, While PUBLIC may not be going anywhere fast, I'd prefer that the parse() call-level support for it be left in SAX. I intend to make ad-hoc use of it internally (rolling my own catalogs and such). I support your other proposed additions to the interface. Gavin. >-----Original Message----- >From: David Megginson [SMTP:ak117@freenet.carleton.ca] >Sent: Sunday, February 22, 1998 10:13 PM >To: xml-dev Mailing List >Subject: SAX: finalising org.sax.xml.Parser > >It's time to finalise SAX before there is such a big code base that we >can no longer make changes. (Thanks, by the way, to James Clark, >DataChannel, and IBM for including native SAX support in their XML >parsers). During this phase, I'd like to make the _minimum_ changes >necessary SAX to define a consistent and simple common functionality >for XML parsers. > >Let's start with the Parser interface. I'll use Java syntax because, >while I can read IDL, I don't trust myself to write it: > > >[current interface] >------------------------------------------------------------------------ > package org.xml.sax; > > public interface Parser { > > public void setEntityHandler (EntityHandler handler); > public void setDocumentHandler (DocumentHandler handler); > public void setErrorHandler (ErrorHandler handler); > > public void parse (String publicID, String systemID) > throws java.lang.Exception; > > } >------------------------------------------------------------------------ > > >After considering the various discussions over the past few weeks, I >propose that we make the following changes: > >1) Add a parse() method that accepts a stream. > >2) Add a parse() method that accepts a character buffer. > >3) Remove public ID from the current parse() method (I don't think > public IDs are going anywhere fast in XML). > >With these changes, the interface would look like this in Java: > > >[proposed changes] >------------------------------------------------------------------------ > package org.xml.sax; > import java.io.InputStream; > > public interface Parser { > > public void setEntityHandler (EntityHandler handler); > public void setDocumentHandler (DocumentHandler handler); > public void setErrorHandler (ErrorHandler handler); > > public void parse (String uri) > throws java.lang.Exception; > public void parse (InputStream is, String baseURI) > throws java.lang.Exception; > public void parse (char ch[], int start, int length, String baseURI) > throws java.lang.Exception; > > } >------------------------------------------------------------------------ > > >NOTES: > >a. The baseURI argument is necessary for streams and character buffers > in case either contains a relative URI. You can supply a null > value if the document entity will not contain relative URIs. > >b. All programming languages initially targeted by SAX (Java, C++, C, > Perl) have some concept of input streams; if we come up against one > that doesn't, it can simply omit the relevant method. > >c. The start and length arguments are necessary with the character > buffer in case the XML document is part of a larger array. > > >Does this give reasonable functionality without limiting the >architectural approaches of parser writers? Remember that individual >implementations can extend this interface, but the interface >represents the minimum common functionality that every SAX-conformant >parser (eventually) provides. > > >Thanks, and all the best, > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Mon Feb 23 14:57:29 1998 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 17:00:11 2004 Subject: xml-based protocol (axtp) References: <3.0.32.19980222200447.00a05330@postman.osf.org> Message-ID: <34F18E5E.AB73276E@mixx.de> this (and the object stream <-> xml conversion) looks interesting. is there a tar/zipped/...'d version anywhere. Bill la Forge wrote: > AXTP: Application eXtensible Transactional Protocol (UDP based) > http://www.camb.opengroup.org/~laforge/axtp/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Mon Feb 23 15:28:08 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:11 2004 Subject: xml-based protocol References: <3.0.32.19980222200447.00a05330@postman.osf.org> Message-ID: <34F19695.17905F99@infinet.com> Bill la Forge wrote: > Finally, AXTP is using xml for the wire protocol. > (I've also created some documentation.) > > AXTP: Application eXtensible Transactional Protocol (UDP based) > http://www.camb.opengroup.org/~laforge/axtp/ This looks interesting except that the TransactionFactory interface has some ridiculous names for the methods like createA(), createN(), etc. etc. For one simple interface, I think that worrying about class file size is a waste of time when compared to having methods and constants which are readable and understandable. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Mon Feb 23 15:29:47 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:11 2004 Subject: finalising org.sax.xml.Parser Message-ID: <001801bd406f$18cbd140$2ee044c6@donpark> David, I agree with most of the changes especially the KISS solution to multiple input type problem. I have just two recommendations: 1. Keep Public ID. 2. Use System ID instead of Public ID. End result is that we just have two new methods in Parser and no change to existing methods. My reasons are: 1. Who knows where that rubber chicken will come in handy? 2. It is trivial for a SAX parser implementor to extract baseURI from URI. 3. It is not trivial and rather confusing for a SAX user to figure out what the base URI is. So the method signatures would be: public void parse (String pubID, String sysID) throws java.lang.Exception; public void parse (String pubID, String sysID, InputStream is) throws java.lang.Exception; public void parse (String pubID, String sysID, char ch[], int offset, int length) throws java.lang.Exception; PS: Parameter orders were changed because I prefer to append new arguments rather prepending. For the new methods, pubID and sysID are used to tell the parser that "data from the given stream or character array should be treated as if it came from given pubID and sysID". Regards, Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmj at thomtech.com Mon Feb 23 16:02:42 1998 From: jmj at thomtech.com (jmj@thomtech.com) Date: Mon Jun 7 17:00:11 2004 Subject: MS XML Parser on the Server Message-ID: <9802238882.AA888249744@ccgate.thomtech.com> Greetings! So where would I find the source code for the C++ version? I haven't been able to find it at the microsoft site. Thanks! --Jim Jordan jmj@thomtech.com -- Thomson Technologies Lab Group From the sublime to the ridiculous is but a step. Napoleon Bonaparte - on the retreat from Moscow ______________________________ Reply Separator _________________________________ Subject: RE: MS XML Parser on the Server Author: Jim Lears <JimL@Alphag.net> at internet Date: 2/21/98 6:35 PM Server.CreateObject in VBScript is used for creating instances of COM objects. The Java XML Parser doesn't expose any COM interfaces...notably IClassFactory which is used to instantiate COM objects. The C++ version is what you need...its an ActiveX control. The source code for both parsers is available. If you insist on using the Java version, you could mod it up to sport a COM interface.. Helping To Destroy The English Language -----Original Message----- From: Mike Wagner [SMTP:mwagner@ets.org] Sent: Friday, February 20, 1998 3:33 PM To: xml-dev@ic.ac.uk Subject: MS XML Parser on the Server Has anybody managed to get the Microsoft Java XML Parser running as a component accessible by ASP under IIS? I tried what seemed to me to be the obvious approach and that didn't work. I copied the java classes to the TrustLib directory, then registered them with javareg. (An excerpt of the BAT I used file is at the end of this message). However, when I try a simple Server.CreateObject("com.ms.xml.om.Document") call in an ASP page, it dies with the following error: Microsoft JScript runtime error '800a01ad' Automation server can't create object /xmltest.asp, line 14 Any insights? Thanks. Mike Wagner Educational Testing Service mwagner@ets.org -----------------Javareg BAT file-------------------- cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:SchemaNode /progid:com.ms.xml.dso.SchemaNode cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLParserThread /progid:com.ms.xml.dso.XMLParserThread cd \winnt\java\trustlib\com\ms\xml\dso javareg /register /class:XMLRowsetProvider /progid:com.ms.xml.dso.XMLRowsetProvider cd \winnt\java\trustlib\com\ms\xml\om xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Feb 23 16:10:40 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:11 2004 Subject: finalising org.sax.xml.Parser In-Reply-To: <001801bd406f$18cbd140$2ee044c6@donpark> References: <001801bd406f$18cbd140$2ee044c6@donpark> Message-ID: <199802231609.LAA01939@unready.microstar.com> Don Park writes: > I agree with most of the changes especially the KISS solution to multiple > input type problem. > > I have just two recommendations: > > 1. Keep Public ID. > 2. Use System ID instead of Public ID. That's two votes for keeping Public ID (and one for sticking with the standard terminology for system IDs, instead of using the Web-hacker-friendly "URI"). I would have no problem going with Don's proposal, especially since it is identical to my discarded first draft -- would anyone prefer _not_ to see public IDs, then? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at jmaca.com Mon Feb 23 16:16:21 1998 From: mike at jmaca.com (Michael Emmel) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data References: <01bd3f3d$977564d0$ed0620ce@lbynum.esri.com> Message-ID: <34F1A319.49A499DE@jmaca.com> Okay I read the spec better now that someone methiond NDATA and I undertstand how the unparsed entity works. What I still do not understand and it seems to be undefinded is how the parser is restarted once and application consumes a unparsed entity. At least for me. ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral NDataDecl::= S 'NDATA' S Name [ VC: Notation Declared ] Hers the description of a VC Validity Constraint: Notation Declared The Name must match the declared name of a notation. The SystemLiteral is called the entity's system identifier. It is a URI, which may be used to retrieve the entity. Note that the hash mark (#) and fragment identifier frequently used with URIs are not, formally, part of the URI itself; an XML processor may signal an error if a fragment identifier is given as part of a system identifier. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element type defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URIs are relative to the location of the resource within which the entity declaration occurs. A URI might thus be relative to the document entity, to the entity containing the external DTD subset, or to some other external parameter entity. An XML processor should handle a non-ASCII character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value). In addition to a system identifier, an external identifier may include a public identifier. An XML processor attempting to retrieve the entity's content may use the public identifier to try to generate an alternative URI. If the processor is unable to do so, it must use the URI specified in the system literal. Before a match is attempted, all strings of white space in the public identifier must be normalized to single space characters (#x20), and leading and trailing white space must be removed. Examples of external entity declarations: and here are some examples <!ENTITY open-hatch SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif > This says to me that binary data is required to either be encoded to ascii to be included, or have Mime type boundries for XML tags with binary data not containing the mime boundries included. In the document or be obtained from a ascii normalized external URI link. There is no way to tell a XML arser to skip x number of arbitrary bytes of embedded unparsed entity data which is consumed by the "application" and then restart the parser at the next valid section. Am I wrong ??? Mike xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Mon Feb 23 16:48:47 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:11 2004 Subject: The XML spec in XML: missing tags Message-ID: <01bd407b$115e7500$1e09e391@mhklaptop.bra01.icl.co.uk> I have been playing with the BNF rules in the XML spec as an exercise in XML tagging. I noticed that in the XML version of the XML spec, the non-terminal symbol "S" is incorrectly tagged in rules 60, 62, and 63, and in consequence it is not hyperlinked in the HTML version. Some comments on the XML tagging in the BNF rules: - it is useful to have the non-terminals tagged, though the way in which it done is a little clumsy, since the internal identifier and the visible name of the non-terminal are necessarily in a one-to-one correspondence. The way it is done seems designed primarily to enable a particular translation to HTML. - it is a shame that there is no tagging to distinguish terminal symbols from metasymbols, since this would enable nicer renditions of the rules, e.g. exploiting colour, without having to parse the BNF - it would seem more logical for each rule to have a single <rhs>, with any <vc> and <wfc> constraints being embedded within the <rhs>, rather than these being separate elements interspersed among multiple <rhs> elements. Two comments on the definition of notation in section 6: - the distinction between non-terminals with an initial upper case and those with an initial lower case is not at all clear (to me). - the precedence of the metalanguage operators (e.g. that "A B | C" means "(A B) | C" is not stated. Thanks to Peter M-R for prompting me to look at this XML exemplar, it has been very stimulating! Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From msuzio at ford.com Mon Feb 23 16:50:27 1998 From: msuzio at ford.com (Michael J. Suzio) Date: Mon Jun 7 17:00:11 2004 Subject: xml:space References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> Message-ID: <199802231650.AA06071@mailfw1.ford.com> What I wonder is, how does SAX decide what is ignorable whitespace and what is significant? I'm not clear on how that works, and the role xml:space plays in defining that. Ignoring whitespace is one of the most tedious things I keep doing in my XML parsing apps, I'd prefer to have to explicitly *work* to keep whitespace. What I don't understand is, given something like this in a DTD: <!ELEMENT QUOTE (SOURCE?|LINE+|KEY+)> Why wouldn't *any* character data located within <QUOTE></QUOTE> (and not inside one of it's child elements) be ignorable? I'd expect a parser seeing this: <QUOTE> <SOURCE href="http://www.quotesrus.com/"> <LINE>This is line 1 of the quote</LINE> </QUOTE> To ignore those carriage returns and extraneous spaces within the QUOTE element, and just give me the SOURCE and LINE elements and their content. Sorry if this is a stupid question, but it has been bugging me the last couple weeks. -- Michael J. Suzio Web Technical Standards, WWW & Internet Applications (313) 24-88120 msuzio@eccms1.dearborn.ford.com / msuzio@ford.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From msuzio at ford.com Mon Feb 23 16:59:20 1998 From: msuzio at ford.com (Michael J. Suzio) Date: Mon Jun 7 17:00:11 2004 Subject: finalising org.sax.xml.Parser References: <001801bd406f$18cbd140$2ee044c6@donpark> <199802231609.LAA01939@unready.microstar.com> Message-ID: <199802231658.AA08077@mailfw1.ford.com> I think keeping the method with Public ID is fine, but if in many cases we're just passing NULL as the first arg, why don't we have a method which just accepts the system ID/URI? I myself have no use for Public ID, so I essentially always just pass in NULL, which to me makes the code look confusing... (I hate NULL/ignored parameters, especially as the first arg, I usually rank args in order of "importance" to the method/procedure). -- Michael J. Suzio Web Technical Standards, WWW & Internet Applications (313) 24-88120 msuzio@eccms1.dearborn.ford.com / msuzio@ford.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Mon Feb 23 17:05:27 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Message-ID: <000f01bd407c$3d9e9f90$2ee044c6@donpark> Michael, Check out the XML-Binary demo at http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html Binary.xml file contains an element with embedded binary data. I do not like notation based solution to binary data because it requires DTD processing. IMHO, High performance XML applications will opt to ignore DTD because it requires additional resources as well as causing processing hiccups. XML-Binary is being designed around a set of reserved attributes which tells you how the data was encoded (base64) and what the data is (image/gif). All this can be done easily by checking for the attributes in a single-pass processing systems. It also allows specification of multi-layer encoding of binary data so that your application can easily tell that an XML element contains postscript image which as compressed using ZIP and then encoded using BASE64. Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Mon Feb 23 17:15:43 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:11 2004 Subject: xml:space References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> <199802231650.AA06071@mailfw1.ford.com> Message-ID: <34F1AF9F.77CD5499@infinet.com> Michael J. Suzio wrote: > What I wonder is, how does SAX decide what is ignorable > whitespace and what is significant? I'm not clear on how that > works, and the role xml:space plays in defining that. > Ignoring whitespace is one of the most tedious things I keep doing > in my XML parsing apps, I'd prefer to have to explicitly *work* to > keep whitespace. > What I don't understand is, given something like this in a DTD: I think for problems like this, the application should just filter it all out itself which is very simple. Here is an inefficient implementation that will do just that for you in Java for instance: String data = "Fee Fi Fo\n\n\n Fum\t\t\t "; java.util.StringTokenizer st = new StringTokenizer(data); StringBuffer buffer = new StringBuffer(); while (st.hasMoreTokens()) { buffer.append(st.nextToken()); buffer.append(' '); } buffer.setLength(buffer.length()-1); String result = buffer.toString(); Result should be "Fee Fi Fo Fum" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Feb 23 17:15:51 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:11 2004 Subject: xml:space In-Reply-To: <199802231650.AA06071@mailfw1.ford.com> References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> <199802231650.AA06071@mailfw1.ford.com> Message-ID: <199802231713.MAA02467@unready.microstar.com> Michael J. Suzio writes: > What I wonder is, how does SAX decide what is ignorable > whitespace and what is significant? I'm not clear on how that > works, and the role xml:space plays in defining that. > Ignoring whitespace is one of the most tedious things I keep doing > in my XML parsing apps, I'd prefer to have to explicitly *work* to > keep whitespace. SAX itself is not a program, but its interface allows DTD-driven parsers to make the distinction described in clause 2.10 (AElfred takes advantage of the distinction): 2.10 White Space Handling In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code. An XML processor must always pass all characters in a document that are not markup through to the application. A validating XML processor must also inform the application which of these characters constitute white space appearing in element content. Note that this has nothing to do with the `xml:space' attribute -- it is your application, rather than the XML parser, that is allowed to act on that one. > What I don't understand is, given something like this in a DTD: > > <!ELEMENT QUOTE (SOURCE?|LINE+|KEY+)> > > Why wouldn't *any* character data located within > <QUOTE></QUOTE> (and not inside one of it's child > elements) be ignorable? I'd expect a parser seeing this: > <QUOTE> > <SOURCE href="http://www.quotesrus.com/"> > <LINE>This is line 1 of the quote</LINE> > </QUOTE> > > To ignore those carriage returns and extraneous spaces within the > QUOTE element, and just give me the SOURCE and LINE elements and > their content. Absolutely correct. If your XML parser is DTD-driven (as AElfred is), it should somehow flag the carriage returns and leading spaces in your example as ignorable. It is a major pain having to deal with this kind of thing yourself, if your parser is not DTD-aware. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at jmaca.com Mon Feb 23 17:22:33 1998 From: mike at jmaca.com (Michael Emmel) Date: Mon Jun 7 17:00:11 2004 Subject: Binary Data Resolved References: <000f01bd407c$3d9e9f90$2ee044c6@donpark> Message-ID: <34F1B1ED.FA5F78B4@jmaca.com> Don Park wrote: > Michael, > > Check out the XML-Binary demo at > http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html > > Binary.xml file contains an element with embedded binary data. Thanks!! Another poster also suggestion that the packaging of various entities that make up and XML documnet is outside of the XML spec. I agree so I think I'll work on my idea of a jar like file with a XML header. Very cool IMHO. and save the Base64 encoding for special circumstances. There does need to be a standard way to transmit all the "static" data that makes up a complete xml document and other complex data soruces. And thanks to all who helped me resolve this it was very important to me. Mike mike@jmaca.com Private post: Subject: Re: Binary Data Date: Mon, 23 Feb 1998 12:04:04 -0500 From: David Megginson <dmeggins@microstar.com> To: mike@jmaca.com References: 1 , 2 , 3 , 4 Michael Emmel writes: > Failing that your left with coming up with a standard way to > "package" all internal links. I think that that is by far a better solution -- kludges (like embedding all objects in a single XML file) are sometimes necessary to get something working, but we don't want to codify them in a spec if we can avoid doing so. A good, general Internet packaging protocol would solve many problems both inside and outside XML. In the mean time, you can use base64 if you really need to. All the best, xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Feb 23 17:22:52 1998 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 17:00:12 2004 Subject: Binary Data Message-ID: <003701bd407f$afc15830$7b0b4ccb@NT.JELLIFFE.COM.AU> From: Michael Emmel <mike@jmaca.com> >This says to me that binary data is required to either be encoded to ascii to >be included, or have Mime type boundries for XML tags with binary data > not containing the mime boundries included. >In the document or be obtained from a ascii normalized external URI link. Binary data can only be included in a parseable entity if it is first encoded in some way which 1) does not contain delimiters which may cause false triggering 2) does not contain any characters which the XML "SGML declaration" says are unused (or shunned). Base64 is one such encoding. Other encodings may be more efficient if you have a 16-bit data stream. The way to signal you are using an encoding is to use an element with a notation attribute. If you embed binary data with MIME type boundaries, you no longer have a parseable XML entity, you have a MIME multipart file which can be processed to generate an XML entity. >There is no way to tell a XML arser to skip x number of arbitrary bytes of >embedded unparsed entity data which is consumed by the "application" and then >restart the parser >at the next valid section. An XML parser is not interested in the contents of a non-XML-parseable entity. Indexing into binary data is either done before the parser (i.e. by embedding the appropriate instructions in the system identifier of the entity) or by the application after the parser. >Am I wrong ??? What do you mean "restart the parser"? Parsing continues after an entity reference. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From msuzio at ford.com Mon Feb 23 17:30:52 1998 From: msuzio at ford.com (Michael J. Suzio) Date: Mon Jun 7 17:00:12 2004 Subject: xml:space References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> <199802231650.AA06071@mailfw1.ford.com> <199802231713.MAA02467@unready.microstar.com> Message-ID: <199802231730.AA15010@mailfw1.ford.com> OK, to be more precise, the problem I think I'm seeing is that, using an XML example, like this: <QUOTE> <SOURCE href="http://www.quotesrus.com/"> <LINE>This is line 1 of the quote</LINE> </QUOTE> I would expect (using SAX) to receive an ignorable() event when the end of the opening QUOTE tag is reached, and the "\n " string found. I'm not seeing that, using the DXP implementation. Should I? I'm not sure if I see what circumstances actually alert a parser that, yes, this whitespace is *not* significant. I know it is supposed to pass the data to the application, but the data is also supposed to be flagged, correct? -- Michael J. Suzio Web Technical Standards, WWW & Internet Applications (313) 24-88120 msuzio@eccms1.dearborn.ford.com / msuzio@ford.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Tue Feb 24 04:02:06 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser References: <199802230313.WAA00386@unready.microstar.com> Message-ID: <34F23D1B.E6172400@jclark.com> > public void parse (InputStream is, String baseURI) > throws java.lang.Exception; > public void parse (char ch[], int start, int length, String baseURI) > throws java.lang.Exception; I don't think this last one is a good idea. If you want something that operates on a stream of characters as opposed to bytes, it should be void parse(Reader r, String baseURI) Using an array of chars is as bad an idea as it would be to replace the InputStream method with a method that operates on an array of bytes. I am not convinced this really buys you anything. It's easy enough to write an InputStream that takes an array of chars and presents then as a sequence of UTF-16 encoded bytes. It also raise some problems since the XML spec doesn't define the operation of a processor on an sequence of chars. For example, what if anything should the processor do with an encoding declaration in this case? If you don't want to put Readerin to avoid dependency on JDK 1.1, I would suggest simply leaving this out for now. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Tue Feb 24 05:13:13 1998 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 17:00:12 2004 Subject: Last call for submissions: XML Developers' Day Message-ID: <199802240511.VAA29721@boethius.eng.sun.com> Reminder: the deadline for submissions is this Friday, February 27. See the original notice below for details. Jon ======================================================================== CALL FOR PRESENTATIONS: XML DEVELOPERS' DAY 1998.03.27 A one-day technical conference for XML developers will be held Friday, March 27, in Seattle, Washington. The event constitutes the last day of the GCA XML Conference (http://www.gca.org/conf/xmlcon98/). XML Developers' Day is a single-track event devoted entirely to technical reports on the latest developments in XML implementation. If you are engaged in the construction of any software that works with XML -- converters, parsers, servers, browsers, editors, or XML-based vertical applications -- here is your chance to share your work with an audience that can understand and appreciate it. Since stylesheet-based rendering is part of XML publishing, developers of tools that support XSL or DSSSL are invited to show their latest offerings as well. We're also open to presentations on XML-based languages (CML, OFX, etc.) and related efforts that might have a significant impact on the future of XML (RDF, XML-Data, etc.) if they are of particular interest to XML developers. Vendors of commercial tools can participate, but they must confine their presentations to the technical aspects of current XML products in development. Table space will be made available for the distribution of product announcements and commercial literature. REGISTRATION The registration fee for XML Developers' Day is $275 for GCA members and $390 for non-GCA members (see the registration page below for conference and tutorial rates). This is mighty inexpensive for an inside update on the very latest activity in this field. You can register at http://www.gca.org/conf/xmlcon98/registra.htm N.B.: Presenters get in free. CALL FOR PRESENTATIONS If you would like to give a report at this event, send a paragraph or two describing your presentation, based on a conservative estimate of the status of your project as it will stand on March 27, to Jon Bosak (bosak@eng.sun.com). Also include a description of the audio-visual equipment you will need for your presentation and an estimate of its duration. Please include the phrase "XML Dev Day" somewhere in the subject line of your message. Since we want up-to-the-minute reports on activities in progress, there will be no published proceedings, and therefore you need not submit your entire presentation in advance. But please try to make your forecasted description as accurate as possible so that we can choose the most interesting and relevant submissions. The deadline for submissions is Friday, February 27. Jon ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems 901 San Antonio Road, MPK17-101, Palo Alto, California 94043 ---------------------------------------------------------------------- If a man look sharply and attentively, he shall see Fortune; for though she be blind, yet she is not invisible. -- Francis Bacon ---------------------------------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 24 06:09:02 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser Message-ID: <002401bd40e9$fde8c510$2ee044c6@donpark> >I don't think this last one is a good idea. If you want something that >operates on a stream of characters as opposed to bytes, it should be > > void parse(Reader r, String baseURI) > >Using an array of chars is as bad an idea as it would be to replace the >InputStream method with a method that operates on an array of bytes. > >I am not convinced this really buys you anything. It's easy enough to >write an InputStream that takes an array of chars and presents then as a >sequence of UTF-16 encoded bytes. It also raise some problems since the >XML spec doesn't define the operation of a processor on an sequence of >chars. For example, what if anything should the processor do with an >encoding declaration in this case? If I remember correctly, what David is trying to do is provide us with means to parse XML data from a byte stream as well as character stream. Since Reader will actually hide the byte-based aspect of the data stream, it in inappropriate for our purpose. XML character stream is also very useful when XML data is generated and processed within a framework. In such a system, converting character streams to byte stream and then converting it back to character stream is unnecessary. As far as what to do with encoding information when dealing with character streams, will there be any problem if SAX just ignored it? Regards, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Tue Feb 24 10:59:34 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:12 2004 Subject: finalising org.sax.xml.Parser Message-ID: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk> >>From: David Megginson [SMTP:ak117@freenet.carleton.ca] [heavily cut] >>After considering the various discussions over the past few weeks, I >>propose that we make the following changes: >> >>1) Add a parse() method that accepts a stream. >>2) Add a parse() method that accepts a character buffer. >>With these changes, the interface would look like this in Java: >> >> public void parse (InputStream is, String baseURI) >> throws java.lang.Exception; >> public void parse (char ch[], int start, int length, String baseURI) >> throws java.lang.Exception; >>NOTES: >> >>a. The baseURI argument is necessary for streams and character buffers >> in case either contains a relative URI. You can supply a null >> value if the document entity will not contain relative URIs. >> Comments: 1. Is the (ch, start, length) method really necessary, given that one can supply a StringReader or whatever to the parse(InputStream) method? 2. If my "main" XML document is in a record in a database, then it is very likely that any other entities referred to will be in the database as well. Therefore, I think the logical approach in this situation is for the application to resolve all URIs encountered: the parser should call the application supplying a URI and the application should return an InputStream to allow the parser to read it. This should presumably be done via the EntityHandler interface. And a question: is there a recommended way to abort a parse once the application has got the information it needs (e.g extracting the contents of the TITLE element)? Would an interface like parser.abort() be cleaner than playing around with exceptions? I ask because in handling the results of a free text search, I am parsing all the retrieved documents when I only need a bit of text from the beginning of each, and this is obviously wasteful. I thought perhaps of supplying a stream and generating a premature end-of-file, and then trapping the exception that comes back. Regards, Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Tue Feb 24 11:55:28 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:12 2004 Subject: finalising org.sax.xml.Parser Message-ID: <01bd411b$2f327400$1e09e391@mhklaptop.bra01.icl.co.uk> >Would anyone prefer _not_ to see public IDs, then? I'm not fundamentally opposed to them, but I can't see much point in them either. The XML spec defines no semantics for a public identifier and we are left to guess that it might have a similar meaning to a similar construct in SGML. They are one of the bits of SGML legacy which should have been taken out. As they're in XML it might make sense to support them in SAX: the problem is that if you do so, you have to say what they mean. (Actually system identifiers aren't very well explained either: we are told they are URI's and there's no definitive statement of what a URI is. The difference is that most readers can guess). Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 24 13:48:16 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser In-Reply-To: <002401bd40e9$fde8c510$2ee044c6@donpark> References: <002401bd40e9$fde8c510$2ee044c6@donpark> Message-ID: <199802241346.IAA00395@unready.microstar.com> Don Park writes: > If I remember correctly, what David is trying to do is provide us with means > to parse XML data from a byte stream as well as character stream. Since > Reader will actually hide the byte-based aspect of the data stream, it in > inappropriate for our purpose. > > XML character stream is also very useful when XML data is generated and > processed within a framework. In such a system, converting character > streams to byte stream and then converting it back to character stream is > unnecessary. This is true, but I think that James's point is well taken. The character _buffer_ doesn't really buy us anything. I am reluctant to use a character reader for two reasons: 1) It is a concept that doesn't translate well to languages other than Java (or even to Java 1.0.2 for that matter). 2) It imposes another architectural requirement on SAX-conformant parsers (the ability to receive characters directly, bypassing the normal input mechanisms), and I'm trying to keep interference to a minimum. It is slightly inefficient to go from characters to a byte stream to characters, but it's not that bad (especially if we use ISO-8859-1 or UCS-2 for the encoding), and it keeps SAX simple and general. Given the discussion so far, then, we are ending up with something like this: package org.xml.sax; import java.io.InputStream; public interface Parser { public abstract void setEntityHandler (EntityHandler handler); public abstract void setDocumentHandler (DocumentHandler handler); public abstract void setErrorHandler (ErrorHandler handler); public abstract void parse (String publicId, String systemId) throws java.lang.Exception; public abstract void parse (String publicId, String systemId, InputStream inputStream) throws java.lang.Exception; } If you need more, you can always extend the interface: package com.acme.xml; import java.io.Reader; public interface SuperParser extends org.xml.sax.Parser { public abstract void parse (String publicId, String systemId, Reader reader) throws java.lang.Exception; } In an ideal world, we'd also have some kind of ability to ask to parser to turn validation on or off, but I'm not certain that that's practical: any thoughts? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 24 13:59:52 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: finalising org.sax.xml.Parser In-Reply-To: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk> References: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <199802241358.IAA00435@unready.microstar.com> Michael Kay writes: > Comments: > 1. Is the (ch, start, length) method really necessary, given that one can > supply a StringReader or whatever to the parse(InputStream) method? James has convinced me that it's not -- I'm actually happy to drop it, since I want to keep the interfaces as simple as possible both to learn and to implement. > 2. If my "main" XML document is in a record in a database, then it is very > likely that any other entities referred to will be in the database as well. > Therefore, I think the logical approach in this situation is for the > application to resolve all URIs encountered: the parser should call the > application supplying a URI and the application should return an InputStream > to allow the parser to read it. This should presumably be done via the > EntityHandler interface. I have considered this approach, but I can anticipate two problems: 1) It puts the burdon of resolving URIs on the application rather than the parser. 2) It is possible that some programming languages or libraries do not represent network connections as input streams. If (2) isn't a problem, we might find a way to work around (1). I'll be coming back to the EntityHandler interface in a future posting, and we can take up the issue again then. > And a question: is there a recommended way to abort a parse once the > application has got the information it needs (e.g extracting the contents of > the TITLE element)? Would an interface like parser.abort() be cleaner than > playing around with exceptions? I ask because in handling the results of a > free text search, I am parsing all the retrieved documents when I only need > a bit of text from the beginning of each, and this is obviously wasteful. I > thought perhaps of supplying a stream and generating a premature > end-of-file, and then trapping the exception that comes back. In languages that support exceptions (Java, C++, Perl, and sort-of C), an exception is probably the cleanest way to handle this. It also lets you pass application-specific information back to the top level within your exception. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 24 14:24:39 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: multiple handlers Message-ID: <199802241423.JAA00516@unready.microstar.com> In a private message, one SAX user raised the issue again of multiple handlers. The user suggested the situation where someone wants to extract information from a document _and_ copy the document to an OutputStream at the same time: for a clean implementation, each of these should be in a different handler. During the last round, most people vetoed this idea. Here it is again, though, for your consideration: package org.xml.sax; import java.io.InputStream; public interface Parser { public void addEntityHandler (EntityHandler handler); public void removeEntityHandler (EntityHandler handler); public void addDocumentHandler (DocumentHandler handler); public void removeDocumentHandler (DocumentHandler handler); public void addErrorHandler (ErrorHandler handler); public void removeErrorHandler (ErrorHandler handler); public void parse (String publicId, String systemId) throws java.lang.Exception; public void parse (String publicId, String systemId, InputStream inputStream) throws java.lang.Exception; } Any further thoughts on this issue? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmodre at edu.uni-klu.ac.at Tue Feb 24 14:31:59 1998 From: jmodre at edu.uni-klu.ac.at (Juergen Modre) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser References: <199802230313.WAA00386@unready.microstar.com> Message-ID: <34F2E818.1FC4A30B@edu.uni-klu.ac.at> David Megginson wrote: > After considering the various discussions over the past few weeks, I > propose that we make the following changes: > > 1) Add a parse() method that accepts a stream. Fully agree. > 2) Add a parse() method that accepts a character buffer. I have similar thoughts like James and therefore don't really see the need for it. For the case to parse parts from an larger document the char[] can always be converted to an InputStream to be used with 1). But maybe your intention goes into another direction. > 3) Remove public ID from the current parse() method (I don't think > public IDs are going anywhere fast in XML). I propose to have a publicID. E.g. the XML parser DXP supports public identifiers. > With these changes, the interface would look like this in Java: > public void parse (String uri) > throws java.lang.Exception; SGML/XML friendly "systemId" vs. Web-hacker-friendly "URI" as parameter name: I personally don't care to much about the name, both are appropiate. Maybe in a method with publicId the name "systemId" is better readable. Both names are fine as long as the are good described/documented (e.g. in the javadoc header in Java) to explain everybody the meaning. > NOTES: > > a. The baseURI argument is necessary for streams and character buffers > in case either contains a relative URI. You can supply a null > value if the document entity will not contain relative URIs. The baseURI gives you all information to parse every relative EntityReference correctly. What's still missing is the name of the document where the parsing started. So this name will miss in an error-message in the starting entity. So I propose to have: public abstract void parse (String publicId, String systemId, InputStream inputStream) instead of public void parse (InputStream is, String baseURI) ----------------------------------------------- JUERGEN MODRE Reisdorf 6 A-9371 Brueckl Austria (Europe) Phone: ++43 4214 2320 Mobile: ++43 664 233 22 22 E-mail: jmodre@edu.uni-klu.ac.at WWW: http://www.edu.uni-klu.ac.at/~jmodre ----------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Tue Feb 24 14:57:01 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers Message-ID: <c=CA%a=_%p=JetForm%l=ROSSINI-980224144959Z-25614@rossini.jetform.com> I like the idea of add/remove versus set. In the Java case it meshes nicely with other Java event mechanisms. From a non-Java biased perspective it does offer considerable extra flexibility in a simple manner. Though I don't have a strict requirement for it today, I'd vote for it. Gavin. >-----Original Message----- >From: David Megginson [SMTP:ak117@freenet.carleton.ca] >Sent: Tuesday, February 24, 1998 9:24 AM >To: xml-dev Mailing List >Subject: SAX: multiple handlers > >In a private message, one SAX user raised the issue again of multiple >handlers. The user suggested the situation where someone wants to >extract information from a document _and_ copy the document to an >OutputStream at the same time: for a clean implementation, each of >these should be in a different handler. > >During the last round, most people vetoed this idea. Here it is >again, though, for your consideration: > > package org.xml.sax; > import java.io.InputStream; > > public interface Parser { > > public void addEntityHandler (EntityHandler handler); > public void removeEntityHandler (EntityHandler handler); > > public void addDocumentHandler (DocumentHandler handler); > public void removeDocumentHandler (DocumentHandler handler); > > public void addErrorHandler (ErrorHandler handler); > public void removeErrorHandler (ErrorHandler handler); > > public void parse (String publicId, String systemId) > throws java.lang.Exception; > > public void parse (String publicId, String systemId, > InputStream inputStream) > throws java.lang.Exception; > > } > >Any further thoughts on this issue? > > >All the best, > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmodre at edu.uni-klu.ac.at Tue Feb 24 15:01:50 1998 From: jmodre at edu.uni-klu.ac.at (Juergen Modre) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser References: <002401bd40e9$fde8c510$2ee044c6@donpark> <199802241346.IAA00395@unready.microstar.com> Message-ID: <34F2EF37.8979C8DE@edu.uni-klu.ac.at> > In an ideal world, we'd also have some kind of ability to ask to > parser to turn validation on or off, but I'm not certain that that's > practical: any thoughts? I thinks that is practical and necessary. One solution would be to have methods like: void setValidation(boolean validation) boolean getValidation() These methods can be called before starting to parse with the parse() method. I also think a parse method with an systemId only as parameter would be convenient. (With targeting to users rather new to XML and not very used to the publicId's). public abstract void parse (String systemId) This would also avoid the need to call every time entityHandler.resolveEntity() to resolve the Entity. ----------------------------------------------- JUERGEN MODRE Reisdorf 6 A-9371 Brueckl Austria (Europe) Phone: ++43 4214 2320 Mobile: ++43 664 233 22 22 E-mail: jmodre@edu.uni-klu.ac.at WWW: http://www.edu.uni-klu.ac.at/~jmodre ----------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Tue Feb 24 15:03:06 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers Message-ID: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> >In a private message, one SAX user raised the issue again of multiple >handlers >Any further thoughts on this issue? > I've implemented a layer on top of SAX that provides not only multiple handlers, but also per-element-type handlers. Since it is trivial to implement this on top of SAX, I suggest it shouldn't go into SAX itself. (The way you do multiple handler is to write a class MultiHandler that implements the DocumentHandler interface and accepts in its constructor two DocumentHandlers; the methods then call these two in turn. Of course either of them can itself be a MultiHandler). Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Tue Feb 24 15:05:49 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser References: <199802230313.WAA00386@unready.microstar.com> <34F2E818.1FC4A30B@edu.uni-klu.ac.at> Message-ID: <34F2E22C.D06BCA45@infinet.com> Juergen Modre wrote: > David Megginson wrote: > > After considering the various discussions over the past few weeks, I > > propose that we make the following changes: > > > > 1) Add a parse() method that accepts a stream. > Fully agree. > > > 2) Add a parse() method that accepts a character buffer. > I have similar thoughts like James and therefore don't really see the need for it. > For the case to parse parts from an larger document the char[] can always be > converted to an InputStream to be used with 1). > But maybe your intention goes into another direction. One way to get around the char[] array problem is to sort of have a feeder mechanism in which you continually feed the parser a set of bytes like in the case of an input stream except that you explicitly turn the parser on before feeding that parser the data and explicitly turn the parser off when you are done feeding it. For example you could have methods that looked like this: Parser.start(); Parser.parseBuffer(char[] c); Parser.end(); Then you could just go through a loop and feed in a character array you populate with the document data until you are finished. This of course would be much more straightforward with an input stream, however this would get around the problem of languages which have no concept of input streams. The biggest problem I see with this suggestion is that it will make writing parsers a bit more difficult to implement since you have to essentially freeze your parser's state after each call to parseBuffer() finishes. Just a suggestion, Tyler Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From drewn at icomm.co.uk Tue Feb 24 15:08:01 1998 From: drewn at icomm.co.uk (Nick Drew) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers Message-ID: <01BD4136.48D7E5F0@krusty.icomm.co.uk> <..stuff deleted...> During the last round, most people vetoed this idea. Here it is again, though, for your consideration: package org.xml.sax; import java.io.InputStream; public interface Parser { public void addEntityHandler (EntityHandler handler); public void removeEntityHandler (EntityHandler handler); public void addDocumentHandler (DocumentHandler handler); public void removeDocumentHandler (DocumentHandler handler); public void addErrorHandler (ErrorHandler handler); public void removeErrorHandler (ErrorHandler handler); public void parse (String publicId, String systemId) throws java.lang.Exception; public void parse (String publicId, String systemId, InputStream inputStream) throws java.lang.Exception; } Any further thoughts on this issue? Apologies in advance: I'm quite new to the list, so missed this discussion first time around. It seems that the above suggestion isn't essential. Perhaps there should be a standardised MulticastEntityHandler, MulticastDocumentHandler, and MulticastErrorHandler, which can be used instead, e.g. { ... MulticastDocumentHandler mdocHandler = new MyMulticastDocumentHandler(); mdocHandler.addHandler( new ExistingDocumentHandler() ); mdocHandler.addHandler( new AnotherExistingDocumentHandler() ); ... iParser.setEntityHandler( mdocHandler ); ... } and the MulticastDocumentHandler just delegates to its members as needed. Nick Drew icomm technologies ltd. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 24 18:52:50 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers In-Reply-To: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <199802241851.NAA00358@unready.microstar.com> Michael Kay writes: > >In a private message, one SAX user raised the issue again of multiple > >handlers > >Any further thoughts on this issue? > > > I've implemented a layer on top of SAX that provides not only multiple > handlers, but also per-element-type handlers. Since it is trivial to > implement this on top of SAX, I suggest it shouldn't go into SAX itself. I had this same thought when I was walking my girls to school after lunch. Unlike a GUI, which spends most of its time waiting for the user to do something interesting, an XML parser has to deal with hundreds or thousands of events each second, and perhaps millions of events in a hefty XML document. Upon reflection, I am becoming more inclined to agree with the arguments that people made in the first round, that the overhead of walking through a vector of handlers and delivering each event to each one can be excessive. Besides, as Michael rightly points out, implementing a multi-listener interface on top of SAX is trivial if you really need it. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Feb 24 19:11:29 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser In-Reply-To: <34F2EF37.8979C8DE@edu.uni-klu.ac.at> References: <002401bd40e9$fde8c510$2ee044c6@donpark> <199802241346.IAA00395@unready.microstar.com> <34F2EF37.8979C8DE@edu.uni-klu.ac.at> Message-ID: <199802241910.OAA00445@unready.microstar.com> Juergen Modre writes: > > In an ideal world, we'd also have some kind of ability to ask to > > parser to turn validation on or off, but I'm not certain that that's > > practical: any thoughts? > I thinks that is practical and necessary. > > One solution would be to have methods like: > void setValidation(boolean validation) > boolean getValidation() > > These methods can be called before starting to parse with > the parse() method. It's trickier than this -- for example, we'd probably have to create an exception that is thrown if the underlying parser does not support validation; furthermore, none of the parsers that I've looked at supports a toggle like this, and we will be forcing another design decision on them if we require this toggle. > I also think a parse method with an systemId only as parameter would be > convenient. (With targeting to users rather new to XML > and not very used to the publicId's). > > public abstract void parse (String systemId) > > This would also avoid the need to call every time > entityHandler.resolveEntity() to resolve the Entity. It might be simpler, though I'm trying to keep the number of methods to a minimum. It wouldn't affect EntityHandler.resolveEntity(), though, since that does not exist solely for the sake of handling public identifiers. Thanks, and all the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Tue Feb 24 19:28:09 1998 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser Message-ID: <c=CA%a=_%p=JetForm%l=ROSSINI-980224192251Z-27616@rossini.jetform.com> David, Something just occurred to me...and maybe its too late, but I thought I'd mention it... With SAX there is an assumption that the whole file will be parsed. I'm stuck if I'm parsing a 1 gigabyte file that contains 50,000 <TRANSACTION> elements (representing transactions of data), and I only want the first transaction. Would it be possible for a mechanism that could pause/resume/terminate a parse? Maybe a callback that returns either a 'continue', 'pause' or 'terminate' status value, and a resumeParse() method? Or a method that I can call from within the callback to pause the parsing. I know that I could throw an exception from within one of my callbacks, which will halt the parse...but it would be valuable to be able 'pause' and 'resume'. Gavin. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Tue Feb 24 20:09:40 1998 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> <199802241851.NAA00358@unready.microstar.com> Message-ID: <34F3295B.26C6F728@infinet.com> David Megginson wrote: > Michael Kay writes: > > > >In a private message, one SAX user raised the issue again of multiple > > >handlers > > >Any further thoughts on this issue? > > > > > I've implemented a layer on top of SAX that provides not only multiple > > handlers, but also per-element-type handlers. Since it is trivial to > > implement this on top of SAX, I suggest it shouldn't go into SAX itself. > > I had this same thought when I was walking my girls to school after > lunch. Unlike a GUI, which spends most of its time waiting for the > user to do something interesting, an XML parser has to deal with > hundreds or thousands of events each second, and perhaps millions of > events in a hefty XML document. > > Upon reflection, I am becoming more inclined to agree with the > arguments that people made in the first round, that the overhead of > walking through a vector of handlers and delivering each event to each > one can be excessive. Besides, as Michael rightly points out, > implementing a multi-listener interface on top of SAX is trivial if > you really need it. You don't need to actually use a Vector, but you could instead use an array or just a single object if the Vector was of length one. You may initially use a Vector to store your the handlers, but when you are about to parse you could just turn this into an array of handlers or else just a single handler. There are a lot of ways to go about this so any performance loss would be a function of how many handlers you are using. Nevertheless, SAX could just have a standard MulticastHandler implementation that dispatches events to multiple handlers. I think it would be useful to include in the Java SAX distribution a generic class to do this sort of thing. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From wilfr at mail.bc.rogers.wave.ca Tue Feb 24 21:55:32 1998 From: wilfr at mail.bc.rogers.wave.ca (Wilf Reedijk) Date: Mon Jun 7 17:00:12 2004 Subject: Modifying DTD using msxml Message-ID: <34F34236.7FD472ED@rogers.wave.ca> I would like to update the (internal) DTD for a document using msxml. I am converting the DTD to a schema using the dtd.getSchema() method I then modify the elements within the schema using addChild etc. My question is: How do convert this schema back to the DOM so that it is saved when the document is saved. Thanks Wilf Reedijk wilfr@rogers.wave.ca xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Tue Feb 24 21:58:55 1998 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 17:00:12 2004 Subject: Modifying DTD using msxml Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C01906CAA@red-msg-56.dns.microsoft.com> I assume you want to convert it back to the DTD syntax - you will have to do this yourself. MSXML doesn't have this feature yet. > -----Original Message----- > From: Wilf Reedijk [SMTP:wilfr@mail.bc.rogers.wave.ca] > Sent: Tuesday, February 24, 1998 1:57 PM > To: xmldev > Subject: Modifying DTD using msxml > > I would like to update the (internal) DTD for a document using msxml. > > I am converting the DTD to a schema using the dtd.getSchema() method > > I then modify the elements within the schema using addChild etc. > > My question is: How do convert this schema back to the DOM so that it is > saved when the document is saved. > > > Thanks > Wilf Reedijk > wilfr@rogers.wave.ca > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmodre at edu.uni-klu.ac.at Tue Feb 24 22:36:01 1998 From: jmodre at edu.uni-klu.ac.at (Juergen Modre) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: finalising org.sax.xml.Parser References: <002401bd40e9$fde8c510$2ee044c6@donpark> <199802241346.IAA00395@unready.microstar.com> <34F2EF37.8979C8DE@edu.uni-klu.ac.at> <199802241910.OAA00445@unready.microstar.com> Message-ID: <34F359AB.99DB806D@edu.uni-klu.ac.at> David Megginson wrote: > > Juergen Modre writes: > > > > In an ideal world, we'd also have some kind of ability to ask to > > > parser to turn validation on or off, but I'm not certain that that's > > > practical: any thoughts? > > I thinks that is practical and necessary. > > > > One solution would be to have methods like: > > void setValidation(boolean validation) > > boolean getValidation() > > > > These methods can be called before starting to parse with > > the parse() method. > > It's trickier than this -- for example, we'd probably have to create > an exception that is thrown if the underlying parser does not support > validation; Correct. My example was just a first naive try. > furthermore, none of the parsers that I've looked at > supports a toggle like this, and we will be forcing another design > decision on them if we require this toggle. There are already XML parsers allowing this toggle. For instance DXP has this capability. I think it would be good to have methods that allow to set a parser into well-formedness or validation mode. > > I also think a parse method with an systemId only as parameter would be > > convenient. (With targeting to users rather new to XML > > and not very used to the publicId's). > > > > public abstract void parse (String systemId) > > > > This would also avoid the need to call every time > > entityHandler.resolveEntity() to resolve the Entity. > > It might be simpler, though I'm trying to keep the number of methods > to a minimum. Okay. All the best Juergen ----------------------------------------------- JUERGEN MODRE Reisdorf 6 A-9371 Brueckl Austria (Europe) Phone: ++43 4214 2320 Mobile: ++43 664 233 22 22 E-mail: jmodre@edu.uni-klu.ac.at WWW: http://www.edu.uni-klu.ac.at/~jmodre ----------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From b.laforge at opengroup.org Tue Feb 24 23:01:43 1998 From: b.laforge at opengroup.org (Bill la Forge) Date: Mon Jun 7 17:00:12 2004 Subject: axtp zip available Message-ID: <3.0.32.19980224180619.00922bf0@postman.osf.org> I've had several requests to create a zip file for axtp. I've done so. See http://www.camb.opengroup.org/~laforge/axtp/#related_links (I've also cleaned up the relationship between the parsed xml object tree and the application peer objects.) And yes, I'm only using a subset of xml. But I think packet size is a big issue here. This has become strictly a spare-time project, and I still need to develop the client and server api's before it can live up to the "easy to use" claim. Perhaps this weekend... Meanwhile, please keep those comments coming. b) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Feb 24 23:30:41 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers In-Reply-To: <199802241851.NAA00358@unready.microstar.com> References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <3.0.1.16.19980224215546.35877758@pop3.demon.co.uk> At 13:51 24/02/98 -0500, David Megginson wrote: > Besides, as Michael rightly points out, >implementing a multi-listener interface on top of SAX is trivial if >you really need it. > As it's trivial, it would be a great help if a specimen were included in SAX that those of us who are per-element people could use. Seriously, I'm not quite sure what it would look like but I am sure I would recognise it when I saw it :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Feb 24 23:51:01 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers Message-ID: <000d01bd417e$521bcbc0$2ee044c6@donpark> >As it's trivial, it would be a great help if a specimen were included in >SAX that those of us who are per-element people could use. Seriously, I'm >not quite sure what it would look like but I am sure I would recognise it >when I saw it :-) This brings up the issue I wanted to bring up for a while: "Should we add helper classes to SAX?" HandlerBase sort of qualifies as a helper class but I think SAX should have a lot more helper classes to help out SAX programmers. For example, a 'pass-through' DocumentHandler that filters out whitespace would be a great help. An abstract implementation of DocumentHandler that takes maintains a stack of ancestor elements would also be nice. A special trigger like DocumentHandler that will return specified patterns (i.e. XSL rule like pattern). I think we have four choices at this point: 1. Leave SAX alone! 2. Add some but as little as possible. 3. Go nuts and let SAX bloat as the months go by. 4. Start EZ-SAX (sorry, I couln't help it. David picked a name ready-made for puns) package to complement SAX. Personally, I am all for EZ-SAX ;-p. Regards, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Wed Feb 25 00:30:26 1998 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 17:00:12 2004 Subject: The XML spec in XML: missing tags In-Reply-To: <98Feb23.120356est.18826@thicket.arbortext.com> Message-ID: <3.0.5.32.19980224192743.00a0d220@village.doctools.com> As the maintainer of the specification DTD, let me say thanks for your comments. At 11:49 AM 2/23/98 -0500, Michael Kay wrote: ... >Some comments on the XML tagging in the BNF rules: >- it is useful to have the non-terminals tagged, though the way in which it >done is a little clumsy, since the internal identifier and the visible name >of the non-terminal are necessarily in a one-to-one correspondence. The way >it is done seems designed primarily to enable a particular translation to >HTML. Are you saying that it's clumsy because the element content is duplicated in the attribute value? Since the XML is transformed into HTML, it would actually have been easier to let the content serve as the address (and be stuffed into both the final <a> element content and its href attribute, with "#" and "-nt" tacked on). Alternatively, the element could have been empty, and its attribute value both used as an address and rendered (with some transformation that probably isn't worth doing...). Either way, nothing would be duplicated in the source. However, it would make me a little uncomfortable treating the same string as having two functions. >- it is a shame that there is no tagging to distinguish terminal symbols >from metasymbols, since this would enable nicer renditions of the rules, >e.g. exploiting colour, without having to parse the BNF I'll take this up with the other editors using the DTD. >- it would seem more logical for each rule to have a single <rhs>, with any ><vc> and <wfc> constraints being embedded within the <rhs>, rather than >these being separate elements interspersed among multiple <rhs> elements. We had a lengthy discussion of whether our production markup should be more semantic and less presentational. It's so much work to make the markup simulate the EBNF and to make the filters handle this, that we decided not to go further in that direction. I do agree that the production markup is less than "pure" in this area. Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 25 00:56:20 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: multiple handlers In-Reply-To: <000d01bd417e$521bcbc0$2ee044c6@donpark> References: <000d01bd417e$521bcbc0$2ee044c6@donpark> Message-ID: <199802250054.TAA00347@unready.microstar.com> Don Park writes: > I think we have four choices at this point: > > 1. Leave SAX alone! > 2. Add some but as little as possible. > 3. Go nuts and let SAX bloat as the months go by. > 4. Start EZ-SAX (sorry, I couln't help it. David picked a name ready-made > for puns) package to complement SAX. > > Personally, I am all for EZ-SAX ;-p. I think that it will be a wonderful idea for people to implement higher-level, programmer-friendly stuff on top of SAX. Exactly what _is_ programmer friendly will depend on the programming language, so I agree that the helper classes should stay out of the SAX core, but I encourage any efforts to make SAX programmers' lives easier (as Don has done with SAXDOM). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 25 01:28:02 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: org.xml.sax.AttributeMap Message-ID: <199802250126.UAA00473@unready.microstar.com> We may as well take up the most difficult interface next, to get it over with. Here's what we have right now for attributes, which are by far the most vexed problem in SAX: package org.xml.sax; import java.util.Enumeration; public interface AttributeMap { public Enumeration getAttributeNames (); public String getValue (String attributeName); public boolean isEntity (String attributeName); public boolean isNotation (String attributeName); public boolean isId (String attributeName); public boolean isIdref (String attributeName); public String getEntityPublicID (String attributeName); public String getEntitySystemID (String attributeName); public String getNotationName (String attributeName); public String getNotationPublicID (String attributeName); public String getNotationSystemID (String attributeName); } BOY, DO I WANT TO CHANGE THIS ONE. James has made some good suggestions about how to make this simpler and more efficient by working from list indexes (it also avoids the need to allocate an Enumeration). Here's what I want to change: 1. Rename the interface to org.xml.sax.AttributeList to reflect the new approach. 2. Add a method to return the length of the list. 3. Look up attribute information based on integer indices rather than string values. 4. Eliminate the is*() methods, and add a single method to return the attribute's type as a string instead. 5. Rename getNotationName() to getEntityNotationName() to make its role clearer. With these changes, we end up with the following, somewhat simpler interface: package org.xml.sax; public interface AttributeList { public abstract int getLength (); public abstract int getName (int index); public abstract int getValue (int index); public abstract String getType (int index); public abstract String getEntityNotationName (int index); public abstract String getEntityPublicId (int index); public abstract String getEntitySystemId (int index); public abstract String getNotationPublicId (int index); public abstract String getNotationSystemId (int index); } The first four methods are actually very nice now (thanks, James, for the suggestion). As specified in the XML REC, getType() will return "CDATA" if there is no explicit declaration, and it will return the declared attribute type otherwise. There's also no further dependency on the Java-specific Enumeration class, so C++ programmers can sigh a sigh of relief. The last five methods are much more of a problem, and I'm still agonizing over what to do. Why do we have binary entities in XML at all? Is anyone going to use them, or will everything be done with href's? Attributes are the _only_ way to get at binary entities in XML, so if I don't provide some way to get access to them here, then SAX parsers and applications make it impossible to use binary (NDATA) entities at all. I am very reluctant to create a new class or interface just for entities (and yet another for notations), when other types of objects do not have their own classes, and I certainly don't want to re-invent (or pre-invent) the DOM. HELP!!! David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 25 01:39:56 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:12 2004 Subject: SAX: org.xml.sax.AttributeMap In-Reply-To: <199802250126.UAA00473@unready.microstar.com> References: <199802250126.UAA00473@unready.microstar.com> Message-ID: <199802250138.UAA00524@unready.microstar.com> David Megginson writes: > package org.xml.sax; > > public interface AttributeList { > > public abstract int getLength (); > public abstract int getName (int index); > public abstract int getValue (int index); > public abstract String getType (int index); > > public abstract String getEntityNotationName (int index); > public abstract String getEntityPublicId (int index); > public abstract String getEntitySystemId (int index); > public abstract String getNotationPublicId (int index); > public abstract String getNotationSystemId (int index); > > } For any of you who are wondering when attribute names and values became integers, the above should have been package org.xml.sax; public interface AttributeList { public abstract int getLength (); public abstract String getName (int index); public abstract String getValue (int index); public abstract String getType (int index); public abstract String getEntityNotationName (int index); public abstract String getEntityPublicId (int index); public abstract String getEntitySystemId (int index); public abstract String getNotationPublicId (int index); public abstract String getNotationSystemId (int index); } All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Feb 25 02:00:39 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:13 2004 Subject: org.xml.sax.AttributeMap Message-ID: <001301bd4190$726aa650$2ee044c6@donpark> David, >The last five methods are much more of a problem, and I'm still >agonizing over what to do. Why do we have binary entities in XML at >all? Is anyone going to use them, or will everything be done with >href's? > >Attributes are the _only_ way to get at binary entities in XML, so if >I don't provide some way to get access to them here, then SAX parsers >and applications make it impossible to use binary (NDATA) entities at >all. I am very reluctant to create a new class or interface just for >entities (and yet another for notations), when other types of objects >do not have their own classes, and I certainly don't want to re-invent >(or pre-invent) the DOM. How about replacing the five with following method and three constants? public static final int NAME = 0; public static final int PUBLIC_ID = 1; public static final int SYSTEM_ID = 2; public abstract String[] getDataInfo (int index); Since AttributeList is valid only within startElement method, you can reuse a single string array rather allocate a new one per getEntityInfo method. If the method returns null, then it is attribute has no info. If you haven't guessed by now, the constants above are used to index into the returned array. Implementations should take steps to make sure the size of the returned array is 3 and stuff null for NAME if it is not a notation. Does this help? Don Park http://www.quake.net/~donpark/index.html > > >HELP!!! > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From antony at n-space.com.au Wed Feb 25 02:24:42 1998 From: antony at n-space.com.au (Antony Blakey) Date: Mon Jun 7 17:00:13 2004 Subject: org.xml.sax.AttributeMap References: <001301bd4190$726aa650$2ee044c6@donpark> Message-ID: <34F37FDC.D945B484@n-space.com.au> Don Park wrote: > How about replacing the five with following method and three constants? > > public static final int NAME = 0; > public static final int PUBLIC_ID = 1; > public static final int SYSTEM_ID = 2; > > public abstract String[] getDataInfo (int index); > > Since AttributeList is valid only within startElement method, you can reuse > a single string array rather allocate a new one per getEntityInfo method. > If the method returns null, then it is attribute has no info. > > If you haven't guessed by now, the constants above are used to index into > the returned array. Implementations should take steps to make sure the size > of the returned array is 3 and stuff null for NAME if it is not a notation. Why would you not simply return a strongly typed data item (ignoring the names) public abstract DataInfo getDataInfo(int index); public interface EntityInfo { public String getName(); public String getPublicID(); Public String getSystemID(); } As far as reuse of values is concerned however, I think this is a very bad idea: startElement defines a new context, so reusing the parameters to that call is workable, however reusing the result from the getDataInfo call is a different kettle of fish. It would be better (if you are so concerned) to keep a pool that you return so that they are not reused within the context of a startElement call. This may seem like more work on the part of the parser implementor, but you shouldn't push this complexity onto the users of the parser when you can safely hide it within the parser. The parser writer can make the effort for efficiencies sake. +----------------------------------+ | Antony Blakey | | N-Space Pty Ltd | | Java - CORBA - SGML - XML | | mailto:antony@n-space.com.au | | http://www.n-space.com.au | +----------------------------------+ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Wed Feb 25 02:46:47 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: org.xml.sax.AttributeMap References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com> Message-ID: <34F38506.3D19B68A@jclark.com> > package org.xml.sax; > > public interface AttributeList { > > public abstract int getLength (); > public abstract String getName (int index); > public abstract String getValue (int index); I think it's also desirable to provide a method to access attribute values by name. Some applications only want to access attribute values this way, and it's inconvenient and inefficient for the application to have to iterate over all the names itself. > public abstract String getType (int index); I like this. > public abstract String getEntityNotationName (int index); > public abstract String getEntityPublicId (int index); > public abstract String getEntitySystemId (int index); > public abstract String getNotationPublicId (int index); > public abstract String getNotationSystemId (int index); I agree that SAX ought to provide access to unparsed entities but I don't think this is the right way to achieve it. For a start, I can have an ENTITIES attribute, so all these methods would need two arguments (the index of the attribute in the attribute list, and the index of the token in the value). Another problem is that it is common to declare unparsed entities in the internal subset, but to declare attribute types in an external DTD, eg <!DOCTYPE doc SYSTEM "doc.dtd" [ <!ENTITY foo SYSTEM "foo.pic" NDATA gif> ]> <doc><picture ref="foo"/></doc> where doc.dtd contains <!ATTLIST picture ref ENTITY #IMPLIED> Now if I parse this without processing the external DTD, the SAX interface as I understand it won't allow be to get at the system and public id for foo, although an application might well intrinsically know that ref is an ENTITY attribute. I think a better approach is for the processor at the end of the prolog to pass an object to the application that provides information about all the declared notations and unparsed entities. XP has a DTD object that does this, but it might be better to call it something else (like UnparsedEntitySet) since SAX might someday be extended to provide full DTD access. Note that if you provide access to the system ID, you have to deal with the issue of relative URLs. Either the processor has to resolve a relative URL into an absolute URL before passing to the application, or it ha to make available a base URL to the application. Here's what XP's DTD interface looks like (it's a little fancier than what's I think is needed for SAX in that it provides access to all general entities not just unparsed ones): package com.jclark.xml.parse; import java.util.Enumeration; import java.net.URL; /** * Information about a DTD. * @version $Revision: 1.4 $ $Date: 1998/02/17 04:20:20 $ */ public interface DTD { /** * Returns an enumeration over the names of general entities declared in * the DTD. */ Enumeration entityNames(); /** * Returns an enumeration over the names of notations declared in * the DTD. */ Enumeration notationNames(); /** * Returns the system identifier for a notation. * Returns null if the notation was not declared or no system identifier * was specified. * A relative URL is not automatically resolved into an absolute URL; * <code>getNotationBase</code> can be used to do this. * * @see #getNotationBase */ String getNotationSystemId(String notationName); /** * Returns the public identifier for a notation. * Returns null if the notation was not declared or no public identifier * was specified. */ String getNotationPublicId(String notationName); /** * Returns the URL of the entity in which the notation was declared. * Returns null if the entity was not declared or the URL of the * declaring entity is not available. */ URL getNotationBase(String notationName); /** * Returns the replacement text of the specified general entity. * Returns null if the entity was not declared or was * as an external entity. */ String getEntityReplacementText(String entityName); /** * Returns the system identifier for a general entity. * Returns null if the entity was not declared or is an internal entity. * A relative URL is not automatically resolved into an absolute URL; * <code>getNotationBase</code> can be used to do this. * * @see #getEntityBase */ String getEntitySystemId(String entityName); /** * Returns the public identifier for a general entity. * Returns null if the entity was not declared or no public identifier * was specified. */ String getEntityPublicId(String entityName); /** * Returns the name of the notation of an unparsed general entity. * Returns null if the entity was not declared or was a parsed entity. */ String getEntityNotationName(String entityName); /** * Returns the URL of the entity in which the general entity was declared. * Returns null if the entity was not declared or the URL of the * declaring entity is not available. */ URL getEntityBase(String entityName); /** * Returns true if an element type was declared to have element content. */ boolean getElementTypeElementContent(String elementTypeName); /** * Returns true if the complete DTD was processed. */ boolean isComplete(); /** * Returns true if <code>standalone="yes"</code> was specified in the * XML declaration. */ boolean isStandalone(); } James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Wed Feb 25 03:04:23 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser References: <199802230313.WAA00386@unready.microstar.com> Message-ID: <34F38943.551F05AB@jclark.com> > public void parse (InputStream is, String baseURI) > throws java.lang.Exception; XML allows the encoding of an entity being specified by an external transport protocol (see 4.3.3): for example, when an XML document arrives over HTTP with a content type of text/xml, then the encoding specified in the charset parameter is supposed to take precedence over that specified in the document entity by the encoding declaration or by XML's default rules. So I think we need an additional argument here: a String specifying the name of the encoding to be used for the InputStream, or null if the encoding specified in the document entity should be used. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Feb 25 07:21:38 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:13 2004 Subject: org.xml.sax.AttributeMap Message-ID: <002301bd41bd$49737650$2ee044c6@donpark> >Why would you not simply return a strongly typed data item (ignoring the >names) Because we are trying to minimize the number of classes to the bare minimum. I don't feel too strongly about the goal but I felt I should make a suggestion. >As far as reuse of values is concerned however, I think this is a very >bad idea: startElement defines a new context, so reusing the parameters >to that call is workable, however reusing the result from the >getDataInfo call is a different kettle of fish. It would be better (if >you are so concerned) to keep a pool that you return so that they are >not reused within the context of a startElement call. This may seem like >more work on the part of the parser implementor, but you shouldn't push >this complexity onto the users of the parser when you can safely hide it >within the parser. The parser writer can make the effort for >efficiencies sake. What I suggested is not any worse than AttributeMap being reused by some of the parsers since the returned value's lifetime is entirely bound by lifetime of AttributeMap. Note that AttributeMap's Enumeration is also invalid once startElement returns. But then I am not at all saying that what I suggest is good. One of the problem facing SAX is its speed. There are far too much objects (mainly Strings) being instantiated unnecessarily because of multiple layers involved. One of the users of SAXDOM measured performance at three levels (SAX, SAXDOM, and his own application on top of SAXDOM) and found that performance decreased by about 50% at each level. Processing of a 1.5 meg XML file took 8 seconds at SAX level, 14 seconds at SAXDOM, and 35 seconds at the application level. I don't know which SAX parser was used. Since I have a particular interest in server-side XML processing, I have a real concern about performance. I am currently feeling out the issues on building a 'pedal-to-the-metal' XML parser with native SAX support. Actually, I am finding that my performance goals can not be met with current SAX API because I must cut down object instantiation down to bare minimum, remove most synchronization, and cluster each stage to allow JIT more effective use of CPU code cache. Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 09:01:06 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: multiple handlers In-Reply-To: <199802250054.TAA00347@unready.microstar.com> References: <000d01bd417e$521bcbc0$2ee044c6@donpark> <000d01bd417e$521bcbc0$2ee044c6@donpark> Message-ID: <3.0.1.16.19980225085026.357795b8@pop3.demon.co.uk> At 19:54 24/02/98 -0500, David Megginson wrote: [...] > >I think that it will be a wonderful idea for people to implement >higher-level, programmer-friendly stuff on top of SAX. Exactly what >_is_ programmer friendly will depend on the programming language, so I >agree that the helper classes should stay out of the SAX core, but I >encourage any efforts to make SAX programmers' lives easier (as Don >has done with SAXDOM). > Although it may not formally be part of SAX, I think it will be extremely valuable to have reference library implementations of parts of the spec. For example, what is a valid Name in XML? You have to treat a large number of special cases for characters, and are extremely vulnerable to revisions of the spec (this is an area where I am sure minor revisions will happen). So a set of library classes of the type: public static boolean isValidName(String name); public static String getCaseSpaceNormalizedAttval(String value); would be extremely valuable. We can then delegate part of the prose to these implementations. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 09:19:40 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: The XML spec in XML: missing tags In-Reply-To: <3.0.5.32.19980224192743.00a0d220@village.doctools.com> References: <98Feb23.120356est.18826@thicket.arbortext.com> Message-ID: <3.0.1.16.19980225083228.2a27adb6@pop3.demon.co.uk> [... I may have missed the postings quoted in this...] At 19:27 24/02/98 -0500, Eve L. Maler wrote: >As the maintainer of the specification DTD, let me say thanks for your >comments. We are very grateful to Eve for having produced the markup specification. Unfortunately she is a victim of her success in that rec.xml [my shorthand for the spec] is the first 'really crunchy official piece of XML' that we can get to grips with for learning and developing our tools. This is why a DTD and its associated semantics/documentation is so important :-). [I would also expect that 'spec.dtd' might be re-usable in other contexts.] > [...] > >We had a lengthy discussion of whether our production markup should be more >semantic and less presentational. It's so much work to make the markup >simulate the EBNF and to make the filters handle this, that we decided not >to go further in that direction. I do agree that the production markup is >less than "pure" in this area. > My interest is similar - but complementary - to Michael's; I am interested in the terminology. Thus I want to be able to abstract the terms [there are 62 termdefs] in the document and produce a model for their structure (e.g. entailment by containment, by linking and so on.) In this way I can create a graphical interactive map of the concepts in the XML spec and have already created a prototype. I would like to know, for example, whether all terms are defined by <termdef> or whether there are some which are simply defined by <term>foo bar</term>. There appears to be some duplication here as well; thus a termdef has an attribute naming the term, but it is also often contained within a <term> later in the 'description'. [And there is at least one case where </termdef> occurs in mid-sentence - I suspect this isn't intended.] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 09:20:45 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: org.xml.sax.AttributeMap In-Reply-To: <34F38506.3D19B68A@jclark.com> References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com> Message-ID: <3.0.1.16.19980225084239.3577ae48@pop3.demon.co.uk> At 09:42 25/02/98 +0700, James Clark wrote: > >> public abstract String getType (int index); > >I like this. > So do I. As XML grows larger and acquires more extensions (XLL, XSL, etc.) there will be an increasing number of 'hardcoded' attribute types and values. For example, the type of HREF/href is effectively determined as CDATA (it would be perverse to make it ID, for example, even if not in xml-link context) and xml:lang is required (I think) to be NMTOKEN or NMTOKENS. Hardcoding all these 'special cases' is a pain and SAX (or DOM) can help with implementing the prose in the specs. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tms at ansa.co.uk Wed Feb 25 10:43:46 1998 From: tms at ansa.co.uk (Toby Speight) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: org.xml.sax.AttributeMap In-Reply-To: David Megginson's message of "Tue, 24 Feb 1998 20:38:34 -0500" References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com> Message-ID: <s8yaz0artw.fsf@plato.ansa.co.uk> David> David Megginson <URL:mailto:ak117@freenet.carleton.ca> => In article <199802250138.UAA00524@unready.microstar.com>, David => wrote: David> David Megginson writes: David> package org.xml.sax; David> David> public interface AttributeList { David> David> //... David> public abstract String getType (int index); David> //... David> David> } We're returning one of a bounded, known set of values. I'd prefer to use an int for this type of thing, along with a set of constants. I.e. public abstract String getType (int index); public static final int CDATA = 0; public static final int NMTOKEN = 1; // etc. The only advantage a String has over this is that you can meaningfully present it to the user as it is. A disadvantage of String is that it is computationally expensive to compare for equality (or equivalently, and worse, to switch() on it). Comparison becomes easier if one provides a set of String constants and guarantees that returned values will test equal with "==". That is not too different to my suggestion of using numeric constants. Converting integers to human-readable Strings is easy: public static String[] typeNames = new String[/* some size */]; static { typeNames[CDATA] = "CDATA"; typeNames[NMTOKEN] = "NMTOKEN"; // etc. } but I don't think this needs to be part of the interface. One might wish to use short or char instead of int if storage space is at a premium; I'm making no judgement on which arithmetic type is best. This proposal is not Java-specific. -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Wed Feb 25 11:36:23 1998 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 17:00:13 2004 Subject: helper classes for SAX Message-ID: <01bd41e1$be824f60$1e09e391@mhklaptop.bra01.icl.co.uk> >"Should we add helper classes to SAX?" > I have written a package on top of SAX which I hope to publish soon - I need to get it past some corporate processes I wrote it because I found I was doing the same thing repeatedly in a number of SAX applications. I call the package SAXON (sorry), and it provides the following services: - allows you to register a handler for a particular element type (or a particular element type in the context of a parent element type). The handler can supply methods to process the element start or end, the character data or ignorable white space in the element, or the start or end of a consecutive group of one or more elements (cf. XSL) - provides you with context information about the element; in particular, its parent and ancestors, their attributes, and also their elder sibling elements. - allows you to associate user data with an element, so for example your start-element method can pass data to the corresponding end-element method - allows you to associate an output "bucket" with an element type, so that all output for that element and its children (unless otherwise specified) goes into that bucket. Useful for splitting documents and for limited re-ordering of elements - allows multiple handlers per element type - includes some standard element handlers for doing HTML rendition, for generating automatic numbering, etc Although I'm not in a position to go public with it yet, I'll be happy to share the current state of development with any individual who wants to collaborate. I do realise of course that some of these facilities can be achieved by using the DOM instead of an event-based parser, and there is a world of stuff in JUMBO that I haven't expored yet. I was trying to add value to SAX without going heavyweight, which of course is a delicate line to tread. Regards, Mike Kay ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Feb 25 13:42:56 1998 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: org.xml.sax.AttributeMap Message-ID: <199802251342.NAA04905@mail.iol.ie> [Toby Speight] > >We're returning one of a bounded, known set of values. I'd prefer to >use an int for this type of thing, along with a set of constants. >I.e. > > public abstract String getType (int index); > public static final int CDATA = 0; > public static final int NMTOKEN = 1; > // etc. > >The only advantage a String has over this is that you can meaningfully >present it to the user as it is. A disadvantage of String is that it is >computationally expensive to compare for equality (or equivalently, and >worse, to switch() on it). If ints are going to be used, lets use values that can be bit-twiddled. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 25 14:12:10 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser In-Reply-To: <34F38943.551F05AB@jclark.com> References: <199802230313.WAA00386@unready.microstar.com> <34F38943.551F05AB@jclark.com> Message-ID: <199802251410.JAA00633@unready.microstar.com> James Clark writes: > XML allows the encoding of an entity being specified by an external > transport protocol (see 4.3.3): for example, when an XML document > arrives over HTTP with a content type of text/xml, then the > encoding specified in the charset parameter is supposed to take > precedence over that specified in the document entity by the > encoding declaration or by XML's default rules. So I think we need > an additional argument here: a String specifying the name of the > encoding to be used for the InputStream, or null if the encoding > specified in the document entity should be used. This is a very good point, as was the suggestion earlier (I don't remember whose it was) that we rearrange arguments in order of decreasing importance to the programmer. With those suggestions in mind, here's my current take on org.xml.sax.Parser: package org.xml.sax; public interface Parser { public abstract void setEntityHandler (EntityHandler handler); public abstract void setDocumentHandler (DocumentHandler handler); public abstract void setErrorHandler (ErrorHandler handler); public abstract void parse (String systemId, String publicId) throws java.lang.Exception; public abstract void parse (InputStream input, String encoding, String systemId, String publicId) throws java.lang.Exception; } I haven't included a setValidate() method yet, partly because I'm not certain what it would really mean. If I did setValidate(false); would that simply prevent the reporting of validation errors, or would it also prohibit the parser from resolving external text entities and the external DTD subset? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmodre at edu.uni-klu.ac.at Wed Feb 25 15:08:56 1998 From: jmodre at edu.uni-klu.ac.at (Juergen Modre) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser References: <199802230313.WAA00386@unready.microstar.com> <34F38943.551F05AB@jclark.com> <199802251410.JAA00633@unready.microstar.com> Message-ID: <34F44259.5887FF1F@edu.uni-klu.ac.at> David Megginson wrote: > This is a very good point, as was the suggestion earlier (I don't > remember whose it was) that we rearrange arguments in order of > decreasing importance to the programmer. I think it was Don Park and I also like it. > With those suggestions in > mind, here's my current take on org.xml.sax.Parser: > > package org.xml.sax; > > public interface Parser { > > public abstract void setEntityHandler (EntityHandler handler); > public abstract void setDocumentHandler (DocumentHandler handler); > public abstract void setErrorHandler (ErrorHandler handler); > > public abstract void parse (String systemId, String publicId) > throws java.lang.Exception; > > public abstract void parse (InputStream input, String encoding, > String systemId, String publicId) > throws java.lang.Exception; > > } I think Don's suggestion was also to have it like public abstract void parse (String systemId, String publicId, String encoding, InputStream input) so that the first parameter part is always the same. So if another constructor will be added only the the last parameter will differ. > I haven't included a setValidate() method yet, partly because I'm not > certain what it would really mean. If I did > > setValidate(false); > > would that simply prevent the reporting of validation errors, or would > it also prohibit the parser from resolving external text entities and > the external DTD subset? It should have the following meaning: - setValidate(false); That the document/stream should be parsed for well-formedness. This should also be the default if nothing was set with the setValidate() method. - setValidate(true); That the document/stream should also be validated during parsing. The question where there is exactly the border between well-formedness parsing and validation parsing should be left to the parser. This border can be found in the XML spec. The SAX interface is/should be useable for both classes of XML parsers and give also the possibility to enable/disable validation. But I agree that it is sometimes not easy to see the clear border between well-formedness parsing and validation parsing in the XML spec. All the best Juergen ----------------------------------------------- JUERGEN MODRE Reisdorf 6 A-9371 Brueckl Austria (Europe) Phone: ++43 4214 2320 Mobile: ++43 664 233 22 22 E-mail: jmodre@edu.uni-klu.ac.at WWW: http://www.edu.uni-klu.ac.at/~jmodre ----------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Wed Feb 25 15:51:03 1998 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 17:00:13 2004 Subject: The XML spec in XML: missing tags In-Reply-To: <98Feb25.042025est.18818@thicket.arbortext.com> References: <3.0.5.32.19980224192743.00a0d220@village.doctools.com> <98Feb23.120356est.18826@thicket.arbortext.com> Message-ID: <3.0.5.32.19980225104757.00a13120@village.doctools.com> Oh, you want *documentation*, do you?? Well, the DTD was hard to write; it should be hard to understand. :-) Seriously, I keep saying that I'll release the reference documentation Real Soon Now, and in fact I'm hoping to be able to spend a few hours tidying it up and releasing it later this week. (There's also a minor DTD update in the pipe.) At 03:32 AM 2/25/98 -0500, Peter Murray-Rust wrote: >My interest is similar - but complementary - to Michael's; I am interested >in the terminology. Thus I want to be able to abstract the terms [there are >62 termdefs] in the document and produce a model for their structure (e.g. >entailment by containment, by linking and so on.) In this way I can create >a graphical interactive map of the concepts in the XML spec and have >already created a prototype. I would like to know, for example, whether all >terms are defined by <termdef> or whether there are some which are simply >defined by <term>foo bar</term>. There appears to be some duplication here >as well; thus a termdef has an attribute naming the term, but it is also >often contained within a <term> later in the 'description'. [And there is >at least one case where </termdef> occurs in mid-sentence - I suspect this >isn't intended.] <termdef> is a really odd way to do term definitions, for my money, but that's what the users wanted. :-) It captures an "inline" definition of a term, and because of the mixed content model, it can't even ensure that a <term> is present to identify the actual term being defined. Likewise, it can't ensure that the definition captured functions as a "standalone" sentence or set of sentences. I suspect that the cut-off sentence was more in the spirit of poetic license. <term> is occasionally used legitimately without a <termdef> wrapper; it's marking a term being used in a special way, without an accompanying definition. Gee, maybe I should just collect all the questions and do the documentation as a Q&A... Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From matthewg at poet.de Wed Feb 25 16:01:23 1998 From: matthewg at poet.de (Matthew Gertner) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser Message-ID: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com> >It should have the following meaning: >- setValidate(false); > That the document/stream should be parsed for well-formedness. > This should also be the default if nothing was set with the setValidate() method. > >- setValidate(true); > That the document/stream should also be validated during parsing. How about a 2x2 matrix? With DTD setValidate(false) - checks for well-formedness, external subset is used for entity and notation declarations, etc. setValidate(true) - full validation Without DTD setValidate(false) - just checks for well-formedness setValidate(true) - throws an exception Matthew xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 17:15:05 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: helper classes for SAX In-Reply-To: <01bd41e1$be824f60$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <3.0.1.16.19980225161441.20a7fe70@pop3.demon.co.uk> At 11:37 25/02/98 -0000, Michael Kay wrote: >>"Should we add helper classes to SAX?" >> >I have written a package on top of SAX which I hope to publish soon - I need >to get it past some corporate processes I understand the problem :-) > > >I wrote it because I found I was doing the same thing repeatedly in a number >of SAX applications. I call the package SAXON (sorry), and it provides the >following services: > >- allows you to register a handler for a particular element type (or a >particular element type in the context of a parent element type). The >handler can supply methods to process the element start or end, the >character data or ignorable white space in the element, or the start or end >of a consecutive group of one or more elements (cf. XSL) >- provides you with context information about the element; in particular, >its parent and ancestors, their attributes, and also their elder sibling >elements. This is useful. I found myself doing the same sort of thing. In a tree-based situation it's easy - I use XLL XPtrs repeatedly. I missed these when I came to implement some things on top of SAX. >- allows you to associate user data with an element, so for example your >start-element method can pass data to the corresponding end-element method >- allows you to associate an output "bucket" with an element type, so that >all output for that element and its children (unless otherwise specified) >goes into that bucket. Useful for splitting documents and for limited >re-ordering of elements Yes. This is partly what my (very simple) SAXSplit does - splits documents into smaller bits. There was discussion at one stage that XML should have a transformation language. Personally I would welcome this. XSL goes half the way in providing a way of identifying components to be split, re-ordered, transformed, etc. but concentrates on graphic rendering for humans. >- allows multiple handlers per element type >- includes some standard element handlers for doing HTML rendition, for >generating automatic numbering, etc I'd certainly like someone else to write code for HTML if that is what is being offered :-) > >Although I'm not in a position to go public with it yet, I'll be happy to >share the current state of development with any individual who wants to >collaborate. :-) > >I do realise of course that some of these facilities can be achieved by >using the DOM instead of an event-based parser, and there is a world of The attraction of SAX is that: - it is simpler for XML newbies to understand - you don't have to hold everything in memory >stuff in JUMBO that I haven't expored yet. I was trying to add value to SAX JUMBO mainly consists of large muddy footprints. Seriously, I would be happy to lose any generic functionality from JUMBO if a better way arises. For example, I use SAX+FOO as the parser and can see a move towards DOM for defining the tree/grove components. When/if I'm happy to go to J1.1 I will seriously consider the Swing JTree, though there are bits I find missing at present. I am not clear what other features are modular but I am sure many are. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Feb 25 17:26:06 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser In-Reply-To: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com> References: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com> Message-ID: <199802251724.MAA02583@unready.microstar.com> Matthew Gertner writes: > How about a 2x2 matrix? > > With DTD > setValidate(false) - checks for well-formedness, external subset is used > for entity and notation declarations, etc. > setValidate(true) - full validation > > Without DTD > setValidate(false) - just checks for well-formedness > setValidate(true) - throws an exception This comes back to the original problem, however: what if I want to include the external subset and external text entities but don't want to validate? I'm not sure that the two should be tied together (AElfred, for example, does not validate, but it does use the DTD). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 18:47:36 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser In-Reply-To: <34F44259.5887FF1F@edu.uni-klu.ac.at> References: <199802230313.WAA00386@unready.microstar.com> <34F38943.551F05AB@jclark.com> <199802251410.JAA00633@unready.microstar.com> Message-ID: <3.0.1.16.19980225170648.0947fd3e@pop3.demon.co.uk> At 16:10 25/02/98 +0000, Juergen Modre wrote: [...] >- setValidate(true); > That the document/stream should also be validated during parsing. > >The question where there is exactly the border between well-formedness >parsing and validation parsing should be left to the parser. This border >can be found in the XML spec. >The SAX interface is/should be useable for both classes of XML parsers >and give also the possibility to enable/disable validation. > > >But I agree that it is sometimes not easy to see the clear border >between well-formedness parsing and validation parsing in the XML spec. > This is an area that I (and I think others) have difficulty with, although I think there are many who are clear how different parsers behave. This also interacts with the 'standalone' value in the xml PI. There is also some potential confusion as to when and how the presence/absence of the external subset makes a difference. If my worries are unfounded, then it should be possible to create a precise description of what parameters, files, internal subsets etc. and need to control the behaviour of a SAX-compliant parser and what it should do. In which case it would be very helpful to see it set out clearly and I'll shut up. If, however, there still is confusion then we shall discover it in these attempts :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Feb 25 19:28:20 1998 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 17:00:13 2004 Subject: The XML spec in XML: missing tags In-Reply-To: <3.0.5.32.19980225104757.00a13120@village.doctools.com> References: <98Feb25.042025est.18818@thicket.arbortext.com> <3.0.5.32.19980224192743.00a0d220@village.doctools.com> <98Feb23.120356est.18826@thicket.arbortext.com> Message-ID: <3.0.1.16.19980225190358.0b6f0f44@pop3.demon.co.uk> At 10:47 25/02/98 -0500, Eve L. Maler wrote: >Oh, you want *documentation*, do you?? Well, the DTD was hard to write; it ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Not ME. I was weaned on 5-hole paper tape. Variables should be no longer than 1 character. >should be hard to understand. :-) Yes. I strip comments from FORTRAN programs as it is good for the soul and saves cards. I must have dreamed it, but someone posted a month or two back that documentation was a *required* part of a DTD :-) > >Seriously, I keep saying that I'll release the reference documentation Real >Soon Now, and in fact I'm hoping to be able to spend a few hours tidying it >up and releasing it later this week. (There's also a minor DTD update in >the pipe.) Great. Seriously - although it wasn't perhaps intended, rec.xml is a splendid vehicle for people to cut their teeth on - it's got structure, uses normalisation, has a good variety of elementTypes but also uses some in a generic manner. The only thing it doesn't use is entities. I have tweaked my SAXSplit jiffy to do produce entities for div1, etc. And - an argument for preserving comments in document structure - there is some splendid archaeology inside... > [...] > ><termdef> is a really odd way to do term definitions, for my money, but >that's what the users wanted. :-) It captures an "inline" definition of a *Users*?? DTD by committee?? gulp. >term, and because of the mixed content model, it can't even ensure that a ><term> is present to identify the actual term being defined. Likewise, it >can't ensure that the definition captured functions as a "standalone" >sentence or set of sentences. I suspect that the cut-off sentence was more >in the spirit of poetic license. Fair enough. The approach I am taking to terminology is based on MARTIF (ISO12200 and ISO12620) - MARTIF itself having strong TEI roots. So I shall use some simple heuristics to transform termdefs to my termEntry's > ><term> is occasionally used legitimately without a <termdef> wrapper; it's >marking a term being used in a special way, without an accompanying >definition. Yes. I shall abstract these. > >Gee, maybe I should just collect all the questions and do the documentation >as a Q&A... Not a bad idea. I certainly don't want you to go to a lot of trouble. One line sentences for each elementType are probably OK, plus any hardcoded semantics (e.g. what the target of IDREFs may/maynot be. [I have a set of simple tools in JUMBO that allow you to browse documents, so you find all elementTypes, their allowed children, attributes, attribute values, etc. and can then display the actual location in the document. You can then make a pretty good guess at what they mean.] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Feb 27 00:01:13 1998 From: donpark at quake.net (Don Park) Date: Mon Jun 7 17:00:13 2004 Subject: JFC 1.1 Released Message-ID: <000401bd4312$190c22e0$2ee044c6@donpark> This is a heads-up notice to those of us interested in Java. JFC 1.1 was released today. It does not include Java2D nor Drag-n-Drop. Metal L&F looks good but I was somewhat disappointed by lack of speed improvements over the beta versions. There are still some update problems and some of the features were maimed or shifted into preview status. It is better than nothing. The fact that JFC 1.1 has now been shipped means that JDK 1.2 beta 3 release is not far behind since JFC 1.1 was supposed to ship at the same time. I just thought you guys might be interested in the news, Don Park http://www.quake.net/~donpark/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Fri Feb 27 02:34:54 1998 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 17:00:13 2004 Subject: JFC In-Reply-To: <000201bd3608$c0c7f4d0$2ee044c6@donpark> Message-ID: <Pine.GSO.3.95.980226182613.2373A-100000@fisher> I also tried the Swing1.0. It is still not compatible with JDK. Does someone work with both JDK and Swing and know how to make them compatible? Thanks Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Fri Feb 27 03:29:11 1998 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: finalising org.sax.xml.Parser References: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com> <199802251724.MAA02583@unready.microstar.com> Message-ID: <34F6321E.8D415644@jclark.com> David Megginson wrote: > > Matthew Gertner writes: > > > How about a 2x2 matrix? > > > > With DTD > > setValidate(false) - checks for well-formedness, external subset is used > > for entity and notation declarations, etc. > > setValidate(true) - full validation > > > > Without DTD > > setValidate(false) - just checks for well-formedness > > setValidate(true) - throws an exception > > This comes back to the original problem, however: what if I want to > include the external subset and external text entities but don't want > to validate? I'm not sure that the two should be tied together > (AElfred, for example, does not validate, but it does use the DTD). The following seem the reasonable combinations to me: - Validate and process all external entities (if you're validating you've got to process all external entities). - Don't validate and process external DTD and parameter entitities depending on the setting of standalone. - Don't validate and process external DTD and parameter entities (irrespective of the setting of standalone). - Don't validate and don't process external DTD and parameter entities (irrespective of the setting of standalone). James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From moroz at paragraph.com Fri Feb 27 13:32:42 1998 From: moroz at paragraph.com (Moroz, Oleg) Date: Mon Jun 7 17:00:13 2004 Subject: JFC Message-ID: <00FE2F436493D111900E00A0C91003780C7C2F@ms.paragraph.com> Zheng Wang[SMTP:zwang@pstat.ucsb.edu] wrote: > I also tried the Swing1.0. It is still not compatible with JDK. > Does someone work with both JDK and Swing and know how to make them > compatible? What do you mean by "not compatible with JDK" ? Swing 1.0 works perfectly with JDK / JRE 1.1.5 for Win32 from Sun and I hope with the latest JDK 1.1.5 for Linux from Steve Byrne (will try that at home tonight). It also works with the latest Microsoft JVM (from IE 4.01), although not so perfect (tooltips don't show text and some examples produce spurious exception stack backtraces, but continue operating). Oleg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Fri Feb 27 17:26:02 1998 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 17:00:13 2004 Subject: ANN: XLogo - programming with XML Logo Turtle Graphics Message-ID: <2.2.32.19980227172605.00916750@dream.paragraph.com> XLogo Announcement ------------------ XLogo is a markup language I wrote to program Logo Turtle Graphics with XML in Java applet. XLogo program is a well-formed and valid XML document. XLogo runtime is a set of Java classes that process XLogo program. The main reason for XLogo was to find out the advantages that XML provides for developing problem domain specific meta languages. Another goal was to learn XML and experiment with SAX - Simple API for XML. To find more about XLogo check: http://www.geocities.com/SiliconValley/Lakes/3767/xlogo-index.html Any comments and ideas are most welcome ! Thanks, Dima --------------------------- dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Feb 28 03:24:03 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: org.xml.sax.AttributeMap In-Reply-To: <34F38506.3D19B68A@jclark.com> References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com> <34F38506.3D19B68A@jclark.com> Message-ID: <199802280322.WAA00888@unready.microstar.com> James Clark writes: > I agree that SAX ought to provide access to unparsed entities but I > don't think this is the right way to achieve it. For a start, I can > have an ENTITIES attribute, so all these methods would need two > arguments (the index of the attribute in the attribute list, and the > index of the token in the value). An excellent point, and one that I missed in the original SAX. > I think a better approach is for the processor at the end of the prolog > to pass an object to the application that provides information about all > the declared notations and unparsed entities. > > XP has a DTD object that does this, but it might be better to call it > something else (like UnparsedEntitySet) since SAX might someday be > extended to provide full DTD access. This is a good idea, but I need to find a way to avoid using the Java-specific Enumeration class that your example uses (since I've already eliminated it from AttributeList). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Feb 28 12:29:18 1998 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 17:00:13 2004 Subject: SAX: Sorting out org.xml.sax.AttributeList Message-ID: <199802281227.HAA00658@unready.microstar.com> I have been working very hard to keep the number of interfaces in SAX to a minimum, but it looks like there will be no way to avoid adding a couple of additional ones if SAX is going to support unparsed entities (as, I think, it must). James's suggestion of using indexed properties instead of a lookup-map is a very good, light-weight one. If attributes, entities, and notations are all indexed, then they will share a certain amount of common functionality which should be split out into its own interface: package org.xml.sax; public interface NameList { public abstract int getLength (); public abstract int getIndex (String name); public abstract String getName (int index); } This is very JavaBean-like, except that getName does not throw an ArrayIndexOutOfBounds exception (it just returns null for an invalid index, and getIndex() returns -1 for a name that is not present). Next, attribute lists extend this interface to add value and type: package org.xml.sax; public interface AttributeList extends NameList { public abstract String getType (int index); public abstract String getValue (int index); } For notations, we need external identifiers instead: package org.xml.sax; public interface NotationList extends NameList { public abstract String getSystemId (int index); public abstract String getPublicId (int index); } Unparsed entities are identical to notations, but they also need the name of the associate notation: package org.xml.sax; public interface UnparsedEntityList extends NotationList { public abstract String getNotationName (int index); } >From a purist point-of-view, UnparsedEntityList and NotationList should both extend a common ancestor, like ExternalObjectList, but I am becoming very concerned at the number of interfaces multiplying here. The application will gain access to these lists through a DTD callback in org.xml.sax.DocumentHandler: public void dtd (UnparsedEntityList entityList, NotationList notationList) throws java.lang.Exception; Should this event always be fired, or should it be fired only if there actually is a DTD? How does this sound to everyone? For me, there are pros and cons: PROS ---- 1) This arrangement is _much_ simpler to understand than the old org.xml.sax.AttributeMap. Most users can deal only with AttributeList (which is now trivial), and they can ignore NotationList and UnparsedEntityList unless they need to use unparsed entities. 2) It is possible to look up a notation or entity directly by name, even if the name appears in a CDATA entity or in character data content. CONS ---- 1) Too many interfaces. 2) Users will complain that the dtd() callback does not return other information, such as lists of declared elements. 3) It may turn out that XML implementors shun unparsed entities and notations in favour of HREF's and MIME types, in which case we will have added this complexity to SAX for nothing. Thanks, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)