From dmeggins at uottawa.ca Sat May 3 19:43:14 1997 From: dmeggins at uottawa.ca (David Megginson) Date: Mon Jun 7 16:57:46 2004 Subject: Entity Value Message-ID: <199705031139.HAA00326@localhost> In WD-xml-lang-970331, the following production describes an entity value: [9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' | "'" ([^%&'] | PEReference | Reference)* "'" Later on, however, clause 4.5 ("Predefined Entities"), states the following: If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped, as shown below: "> I'm afraid that I do not understand how do the entity values for < and & satisfy the EntityValue production. Am I missing something elsewhere in the draft? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com University of Ottawa dmeggins@uottawa.ca http://www.uottawa.ca/~dmeggins xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 3 23:40:54 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:46 2004 Subject: XML and HTML browsers Message-ID: <6110@ursus.demon.co.uk> I would like to re-use *existing* browser functionality rather than continuing to extend the *generic* aspects of a browser in JUMBO. I'm interested in exploring the general question of how a specialist Java application interacts with a Java-enabled HTML browser. I'm not an expert here, but clearly Javascript is a potential solution. (I hacked a bit, and can't yet say I feel happy with the process - but perhaps that's because I haven't got the feel that JavaScript is going to continue to be around and usable in a standard form.) Anyway... At present I can define my requirements quite simply: I have a chunk of XML that I can transform into HTML and I want to show it in the browser. However, the browser must: - add hotspots to hyperlinks where appropriate - send me back a message/callback when a link is activated - allow me to: use paint(Graphics g) to a specific area of its screen (allowing for scrolling) OR: let me supply it with an IMG for that area - manage the multiplicity of windows (NEW, REPLACE, etc) - allow mouse events within 'my' graphics area and return them to me - allow me to post menus of some sort and return events (There is probably stuff I have forgotten...) There is also the question of spawning an XML helper application when the *browser* encounters a text/xml file (which might turn out to have DOCTYPE CML in it and so require a more specific helper). Also, if I have an XML-LINK to a *.html file (rather than an XML file) then can I instruct the browser to display that and keep the XML application for further action. This is not surprisingly a bit rambling - any light would be valuable. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Sun May 4 02:18:37 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:46 2004 Subject: XDB 0.3 available (XMLized Docbook 3.0) Message-ID: <199705040021.RAA22786@bolt.sonic.net> Now available at http://www.sonic.net/~tallen/xdb03.zip is a DTD derived from DocBook 3.0 that I believe is valid XML. Thanks to Norbert Mikula for pointing out that the %local.foo.foo; parameter entities, which are defined with the content "", are illegal per XML-lang production 46. The sole difference between XDB 0.3 and XDB 0.2 (which I've removed) is the deletion of these parameter entities and an updated copyright notice. This change blows away the DocBook customization mechanism. The purpose of distributing this DTD is solely to determine what constitutes a valid XML DTD. Obviously some other method of customization will be required (unless the SGML ERB can be persuaded to relax its strictures on empty parameter entities). The quick hack that comes to mind is to define these pe's as containing a placeholder element ZZZZZ, which would be declared without an ATTLIST. As I remarked earlier, I'd be happy to hear of other solutions. As the DTD is fairly useless without the customization mechanism, this version cannot be considered progress on the road to a proper XMLlated DocBook that the Davenport Group would want to distribute. Mea culpa. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Sun May 4 12:20:28 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:57:46 2004 Subject: XDB 0.3 available (XMLized Docbook 3.0) In-Reply-To: <199705040021.RAA22786@bolt.sonic.net> Message-ID: On Sat, 3 May 1997, Terry Allen wrote: > Now available at > > http://www.sonic.net/~tallen/xdb03.zip > > is a DTD derived from DocBook 3.0 that I believe is valid XML. And NXP says "You bet it is" ;-) I have used the latest release of NXP (not yet published) and tried it with the "Blue" test file. I only needed to take care of filenames (problem with case-sensitivity), but then everything went fine, at the first attempt ! Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dmeggins at uottawa.ca Sun May 4 13:14:13 1997 From: dmeggins at uottawa.ca (David Megginson) Date: Mon Jun 7 16:57:46 2004 Subject: [Correction] Entity Value In-Reply-To: <132239552@toto.iv> Message-ID: <199705041112.HAA00233@localhost> David Megginson writes: > If the entities in question are declared, they must be declared as > internal entities whose replacement text is the single character > being escaped, as shown below: > > > "> > > > > > I'm afraid that I do not understand how do the entity values for < > and & satisfy the EntityValue production. Am I missing something > elsewhere in the draft? That last paragraph should read > I'm afraid that I do not understand how do the entity value for > & satisfies the EntityValue production. Am I missing something > elsewhere in the draft? Thanks, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com University of Ottawa dmeggins@uottawa.ca http://www.uottawa.ca/~dmeggins xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dgd at cs.bu.edu Sun May 4 20:04:47 1997 From: dgd at cs.bu.edu (David Durand) Date: Mon Jun 7 16:57:46 2004 Subject: XDB 0.3 available (XMLized Docbook 3.0) In-Reply-To: <199705040021.RAA22786@bolt.sonic.net> Message-ID: At 5:21 PM -0700 5/3/97, Terry Allen wrote: >Thanks >to Norbert Mikula for pointing out that the %local.foo.foo; parameter >entities, which are defined with the content "", are illegal per >XML-lang production 46. The sole difference between XDB 0.3 and >XDB 0.2 (which I've removed) is the deletion of these parameter >entities and an updated copyright notice. > >This change blows away the DocBook customization mechanism. The >purpose of distributing this DTD is solely to determine what >constitutes a valid XML DTD. Obviously some other method of >customization will be required (unless the SGML ERB can be >persuaded to relax its strictures on empty parameter entities). I want to ask what justification there is, if any, for ruling out empty PEs? I don't remember discussion of this point clearly (though I do remember shock when the rule was pointed out). This seems like a very bad idea, as Terry's desire for add entities in is very reasonable. Is this fallout of the weird SGML interleaving of entity structure and content model parsing? -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Sun May 4 21:23:02 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:57:46 2004 Subject: Entity Value In-Reply-To: David Megginson's message of Sat, 3 May 1997 07:39:57 -0400 References: <199705031139.HAA00326@localhost> Message-ID: <5316.199705041922@grogan.cogsci.ed.ac.uk> I believe the spec. is inconsistent at the point David identifies, and must be corrected to read I've told Tim and Michael this, but have never had a definitive reply. ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Mon May 5 10:22:53 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:57:46 2004 Subject: Empty PE - was Re: XDB 0.3 available (XMLized Docbook 3.0) References: Message-ID: <336DFC2B.3F53@edu.uni-klu.ac.at> David Durand wrote: > I want to ask what justification there is, if any, for ruling out empty > PEs? I don't remember discussion of this point clearly (though I do > remember shock when the rule was pointed out). > > This seems like a very bad idea, as Terry's desire for add entities in > is very reasonable. I do agree, that Terry's desire is very reasonable. However, if we can find not a formal and concise way to express it we have problems. One of the objectives of XML was, that it should be "easy" to implement and it should incorporate "contemporary" disciplines of computer science like formal languages etc. As you probably know, I use JavaCC, a Lex/Yacc like approach, to build NXP. I really had difficulties to transform this production to LL(1), and I am still not sure if there is a clean way to bring it to LL(n) (a way that I could live with). Furthermore it violates the general idea that %a should actually satisfy S? a S?. With an empty PE this would not work. As much as a first liked the %a idea, when I had to implement it, and I still not satisfied with my current solution, it caused a lot of headache (ouuuch). So in short, I don't mind the idea of having empty PE's, if it is possible to implement/express it in a reasonable way. Any ideas would be appreciated ! -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon May 5 15:25:02 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:46 2004 Subject: JUMBO Message-ID: <6155@ursus.demon.co.uk> Here is a summary of some progress with JUMBO. My intention is to have it all tidied by Barcelona. JUMBO has its own parser (Mus Michaelis algorithm), but can use NXP via a command-line switch (and will hopefully grok Lark in the same way in a few hours). This means that it gives a visual rendering of current XML files assuming they parse with NXP or Lark. Errors. NXP does not throw catchable errors but (I think) produces a null output stream. Lark requires JUMBO to handle doSyntaxError(). JUMBO has no heuristics to turn a broken WF-document into something valid. If the parser writer wishes to pass JUMBO an Esis (NXP) or a Tree (Element*, Lark) then JUMBO will treat that as valid if it isn't thrown an error. Without being thrown an error, or being passed an error flag, it's not easy for JUMBO to know it's got one. If you think this list has become slightly sleepy, and you haven't been reading the WG discussions, you can always ask: 'How do we treat parse errors?'. The general feeling is it's an implementation matter, so we have to have a means of passing it to applications. JUMBO has been rewritten internally to remove the grotesque architecture that I started with. This has not added functionality, but I am at least prepared to show some code in public. JUMBO is now in a state where it is possible to use it to convert legacy files to WF-XML. Some limited validation of content model and attributes are also possible but they are not automatically DTD-driven, because DTDs have a poor API for programming. JUMBO implements XML-LINK as far as I understand it. I have NOT done: - spans and '..' because I am unsure of the semantics - GROUP/DOCUMENT - because I can't see *what* to implement - some of the trickier bits of negative addressing in PREVIOUS, etc. because I'm waiting till that's stable and everyone actually agrees one its operation Most XML-LINK implementation is application-dependent. *I AM MISSING ANY EXAMPLES OF XML-LINK other than my own.* I can implement my own, but they are probably JUMBO-specific. JUMBO has primitive editing facilities, especially for WF-docs. I have not done attribute editing, because programming little boxes in Java is horrible. In principle JUMBO can validate content models if it uses NXP's code, but I need to discuss this with Norbert. JUMBO is aimed at supporting *INFORMATION COMPONENTS* rather than traditional DTDs (which are not much use in technical and scientific subjects). An information component is an Element linked to code for displaying, processing, etc. (I use Java). JUMBO can manage many of the common information components such as hypertext, images, tables, graphs, bibliography, etc. If you are interesting in learning more about this, I am launching an 8-week virtual course on this at: http://www.vsms.nottingham.ac.uk/vsms/java and a CDROM with the new JUMBO, examples, API, etc. will be included in the course materials. No previous knowledge of XML and Java is assumed, but you need programming skills. All necessary information is one the WWW pages. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dmeggins at uottawa.ca Mon May 5 19:04:06 1997 From: dmeggins at uottawa.ca (David Megginson) Date: Mon Jun 7 16:57:46 2004 Subject: More Entity-Value fun In-Reply-To: <199705051627.LAA49280@tigger.cc.uic.edu> References: <199705041112.HAA00233@localhost> <199705051627.LAA49280@tigger.cc.uic.edu> Message-ID: <199705051702.NAA00381@localhost> Here is another XML quandry: how can I declare an internal entity with "25%" as its replacement text -- without using a character reference -- when "%" is not allowed to appear in an entity value? Perhaps it would make sense to add % to the predefined entities (4.5). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com University of Ottawa dmeggins@uottawa.ca http://www.uottawa.ca/~dmeggins xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Tue May 6 02:11:07 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:46 2004 Subject: Strong Typing in SGML and XML Message-ID: <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca> Ever since about 15 minutes after SGML was born, database people have been discovering, to their surprise, that it contains no facilities for strong data typing. You can have an element named , and SGML will have no problem accepting purple bananas rule. Whenever more than two people start talking about the future of SGML, someone starts complaining about typing. With the advent of XML, the volume has increased. As an old database guy, I've been one of the loud complainers. While we're really not ready for this on the WG, it is something that we're going have to do something about before too long. So I've posted a modest proposal at: http://www.textuality.com/xml/typing.html Overview points: 1. This only types elements, not attributes. It's easier. 2. It's based on SQL types, not HyTime lextypes. That's what the database world is used to. This could probably be implemented using lextypes. 3. The syntax for dates and so on should match some ISO standard, but I haven't found which one yet. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue May 6 11:29:53 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:46 2004 Subject: Strong Typing in SGML and XML Message-ID: <6204@ursus.demon.co.uk> In message <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca> Tim Bray writes: > Ever since about 15 minutes after SGML was born, database people have been > discovering, to their surprise, that it contains no facilities for > strong data typing. You can have an element named , and > SGML will have no problem accepting > > purple bananas rule. > > Whenever more than two people start talking about the future of SGML, > someone starts complaining about typing. With the advent of XML, the > volume has increased. As an old database guy, I've been one of the loud > complainers. I agree fully with this proposal. This also highlights one of the essential aspects of XML-DEV, which is going to come up repeatedly. This is that there are things that the ERB/WG is going to consider in the future, but people want ways forward right now. XML-DEV provides a forum so that: - people can find what previous approaches already exist - groups of people can point in the same direction if they wish to - problems can be identified before the ERB/WG process, making that faster and more effective. This is an area which I've had to address in CML. CML uses strong-data-typing but I made it up myself. It has STRING, INTEGER, FLOAT, DATE and various others that XML-LINK has made obsolete. So it's very easy to change to the approach suggested here. Wherever possible concepts should be re-used and I like the use of SQL. (I don't like *SQL*, but that's a different matter). I'm assuming, Tim, that some of the proposal was carried nearly verbatim, because parts of it are slightly opaque to those who don't know the SQL standard. > > While we're really not ready for this on the WG, it is something that > we're going have to do something about before too long. So I've posted > a modest proposal at: > > http://www.textuality.com/xml/typing.html Good start. I don't think it needs expanding in scope, just some reworking in places. > > Overview points: > > 1. This only types elements, not attributes. It's easier. Agree 100%. I started with typed attributes and there is an enormous amount of work in managing them as well as typed content. You have to be able to serach them, transform them (at least in CML), qualify them with attributes and so on. > 2. It's based on SQL types, not HyTime lextypes. That's what the > database world is used to. This could probably be implemented What you have seems fine. I assume that it is virtually an automatic translation. > using lextypes. > 3. The syntax for dates and so on should match some ISO standard, > but I haven't found which one yet. Do you mean you there are several and you haven't decided between them? I thought that people had converged on a single one (I can't remember the number, it's something like 8601). Detailed points: I don't find SQLSIZE 'obvious' - it's essentially the character-string length, and if starting from scratch it should be more like SQLMAXLENGTH. But if everyone uses it and learns to love it, I suppose we have to. In box 2 you have XML-MIN - I assume this is a typo. I found SIZE, MIN and MAX, very confusing. I *think* that the text is correct, but it's very easy to get lost. Are we stuck with these? 4.5 Presumably SQLMIN<=SQLMAX? etc... 4.6 Reference to SQL SCALE was unclear. Is there a requirement for SQLSCALE as well or does this simply need rewriting. 4.7 I am not happy without exponential notation. For example do we really have to represent Avogadro's number (6.023E+23) as 602300000000000000000000? Surely we can use IEEE notation? Is equality defined/definable for floating point? 4.8 I go along with 8601 or whatever it is. That also defines TIME. 4.9 SQLSIZE was bad enough before. Overloading it to manage the timezone is really horrible. Is this not defined in 8601 in which case we can use it? 4.10 Again I think this is covered by the ISO standard. But this is an excellent start. Again I raise the idea that XML should introduce Generally Accepted Conventions. This could be one. Later it might become part of the standard. This way we help point people in the right direction. We have a lot of readers of XML-DEV. This sort of area is an excellent one to be contributing to. Volunteers to summarise resources of this sort (e.g. pointers to the ISO data standard, SQL datatyping, etc.) would be much appreciated. P. > > Cheers, Tim Bray > tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From DPawson at rnib.org.uk Tue May 6 12:26:44 1997 From: DPawson at rnib.org.uk (David Pawson) Date: Mon Jun 7 16:57:46 2004 Subject: DOCTYPE misunderstood Message-ID: In developing some demonstration XML we have come across an issue we would like to resolve, perhaps the experts out there could help. >From a single source document, marked up in XML, we need to produce 4 output transforms, braille, large print, html and typeset. Additionally, we want (for local use) to be able to 'create' 'document type' (our own definition). Question: Should we be using the doctype as the switch, or an input to the output processing application (perhaps as a command line option). Our definition on document type goes something along the lines of (for one particular use) - an editors note, a report, a memo. [Seems logical to talk about document type in this way]. The spec doesn't give 'usage hints' for doctype, what are the perceptions from the authors? We are happy with selectable DTDs' (or not), but should we be alterning the source document simply to obtain output variants? That seems to go against the idea of many outputs from single source. Advice would be appreciated Regards, DaveP dpawson@rnib.org.uk xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From chaotic at maths.tcd.ie Tue May 6 16:08:26 1997 From: chaotic at maths.tcd.ie (Alan Spencer) Date: Mon Jun 7 16:57:46 2004 Subject: DOCTYPE misunderstood In-Reply-To: Your message of "Tue, 06 May 1997 11:20:36 -0000." Message-ID: <9705061508.aa23613@salmon.maths.tcd.ie> In message David writes: > ... > From a single source document, marked up in XML, we > need to produce 4 output transforms, braille, large print, html > and typeset. > > Additionally, we want (for local use) to be able to 'create' > 'document type' (our own definition). > > Question: Should we be using the doctype as the switch, > or an input to the output processing application (perhaps as > a command line option). > > ... > Regards, DaveP > dpawson@rnib.org.uk > I would also be interested in an answer to this question as I am developing a similar system. Also, I have a similar question: Is it possible to have a system where if a particular entity is required for rendering a 'document' that it may be included from a master document. This question comes from a problem (which I have solved very inelegantly using perl) which involves many levels of document definition. The system I have in place let's you define (currently through HTML) properties of a whole set of documents, for example the background colour, or maybe keywords, and a set of sets. I have implemented this as a tree of documents where properties are inherited down from parent to child. The system is very hacky and not too robust. I could see XML being very useful in these (and many other) problems, and for this reason I am trying to implement it using it. I think this may be a question which concerns DSSSL and some sort of 'parent linking', but I'm not familiar enough with the way these work to say. Thanks, Alan Spencer. ################################################################################ chaotic@maths.tcd.ie http://www.maths.tcd.ie/~chaotic/ Trinity College Dublin - Maths Student. ################################################################################ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Tue May 6 19:24:11 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:47 2004 Subject: Strong Typing in SGML and XML Message-ID: <3.0.32.19970506102133.007a1bf0@pop.intergate.bc.ca> At 09:44 AM 5/6/97 GMT, Peter Murray-Rust wrote: >> 3. The syntax for dates and so on should match some ISO standard, >> but I haven't found which one yet. > >Do you mean you there are several and you haven't decided between them? >I thought that people had converged on a single one (I can't remember >the number, it's something like 8601). I mean I spent half an hour poking around the Web and didn't come up with anything right away. If someone will send me a pointer to the standard syntax, I'll put it in the draft. >I don't find SQLSIZE 'obvious' OK, all of the types but one need a single parameter; each parameter is numeric, except for DATE, which is a boolean for timezone existence. I didn't want to make up different attributes for each one. Yes, it's hopelessly overloaded. Maybe it should just be called XML-SQLPARAM. It is *not* the case that there is a single concept in SQL to which all these parameters map. >In box 2 you have XML-MIN - I assume this is a typo. Right. >I found SIZE, MIN and MAX, very confusing. I *think* that the text is >correct, but it's very easy to get lost. Are we stuck with these? Not stuck; this is the first ever draft. Improvements welcome. >4.5 Presumably SQLMIN<=SQLMAX? etc... Yes. >4.6 Reference to SQL SCALE was unclear. Is there a requirement for SQLSCALE >as well or does this simply need rewriting. Scale of a decimal fraction is the number of digits to the right of the decimal point. >4.7 I am not happy without exponential notation. For example do we >really have to represent Avogadro's number (6.023E+23) as >602300000000000000000000? Surely we can use IEEE notation? Yes, this has to be supported. Somebody else pointed that out too. >Is equality defined/definable for floating point? Yes, because in the real world, there are no real numbers [sorry, math joke] - what I mean is that floating point numbers exist either as fixed-size binary objects in computer storage, or as strings of digits, decimal points, and exponents, also in storage. Either way, equality tests are meaningful. Given good implementations of the IEEE rules, they are even useful. -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Tue May 6 20:24:28 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: My application accepts plain text. If its client wants it to do a better job, it can markup the text using an XML syntax. So, the client could want to send the application something like: This is plain text. However, if the application is expecting XML markup, then it would be nice if everything a client sent was an XML document. So, for the sake of clarity and consistency, I can force the client to send: This is plain text. Well, that doesn't work, because that isn't a well-formed XML document because it doesn't have an element, see: [23] document ::= Prolog element Misc* So I could force the client to send: This is plain text. where "foobar" is the client's choice of a lega name: [5] Name ::= (Letter | '_') (NameChar)* But forcing the inclusion of characters that don't convey any useful information to the application goes against my sense of cleanliness. Why must an XML document include at least one element? xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Tue May 6 20:47:39 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: <3.0.32.19970506114453.009f63f0@pop.intergate.bc.ca> At 02:19 PM 5/6/97 -0400, Eric Baatz - Sun Microsystems Labs BOS wrote: > [23] document ::= Prolog element Misc* >So I could force the client to send: > > > This is plain text. This would still not work, because "This is plain text." doesn't match the nonterminal 'Misc'. You need: This is plain text. >But forcing the inclusion of characters that don't convey any >useful information to the application goes against my sense of >cleanliness. > >Why must an XML document include at least one element? If you don't have any useful info to convey, then don't put in the tag. It's not XML, but the text is presumably still useful. It is a defining characteristic of XML that any "character data", i.e. non-markup text, has to be part of an element. In other words, a document must have a logical structure, and all its text must have a place in that logical structure. One benefit: you know unambiguously when the message has ended, without waiting for sockets to close and so on. One of the things that makes XML processors simple is they can look simple-mindedly for begin and end tags, no exceptions. I can accept that there are tons of useful documents that do not have an explicitly-marked up logical structure, and an important place in the world for plain text. And, we hope, an important place for XML. But they're not the same thing. -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Tue May 6 21:05:34 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: After I had send my message, I was staring at [23] document ::= Prolog element Misc* [36] element ::= EmtpyElement | STag content ETag [35] content ::= (element | PCData | ... | Comment)* when the light went on and I said "Oh, everything in one element. Wish I hadn't sent that last message." Thanks for such a quick, gentle, value-added response. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue May 6 21:23:06 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: <6219@ursus.demon.co.uk> Hi Eric, I see Tim has answered some of your queries. I'll take another (implied one). In message Eric Baatz - Sun Microsystems Labs BOS writes: > My application accepts plain text. If its client wants it to do I think you have an assumption here that you know what software will be processing your document at the other end. So far that isn't defined in XML - it may be later. At present all that the client knows is: - the document is XML - *possibly* what the DOCTYPE is - *possibly* what stylesheets are associated with the document. AT present there is no mechanism in XML for saying 'please process this document with FOOBAZ software'. That's more like a plugin requirement. The most that XML can say is: - please apply this stylesheet to the document. And the stylesheet can have sophisticated algorithmic behaviour through DSSSL - OR please apply this behavior to the document (or some part of the document). At present the syntax isn't defined. My current approach in JUMBO is to apply a separate Java class per element. Other people may have different strategies. Let's assume your document is This is the first line and there was a newline If you sent your document to JUMBO, it would capture the text including spaces and newlines and store it as a PCDATA element. If you wanted to output it it would output it as you sent it. If you wanted to display it it would look excatly the same. If, however, you used instead it would try (rather crudely) to format it as HTML. the newline would disappear and newlines would be included in the display where the text hid the right edge. At present JUMBO is not sophisticated enough to manage the DEFAULT|PRESERVE attribute - by default it's DEFAULT which is the application's default w/s processing mode (which happens to be PRESERVE!!). Remember also that a 'plain text' document has a lot of implied structure which the application cannot be expected to pick up without careful conventions. > a better job, it can markup the text using an XML syntax. > > So, the client could want to send the application something like: > > This is plain text. > > However, if the application is expecting XML markup, then it would I am not quite sure whether I understand your use of client and application. My model is: WWW --->doc---> parser --> application If you use a WWW browser (?client) to interface to the WWW, then you might have: WWW--->doc---> browser -->parser --> application Some people would call the whole of the client-side stuff a client, whereas others might just use it for the browser. I think this is an important point and have urged the XML community to try to identify these components precisely. For my own part, I separate parser and application in the architecture, and this is a useful model. What does your application get from the browser/parser? We're still trying to work that out. NXP gives me an Esis stream [Norbert, I need a handle to extarct the DOCTYPE, since that's not in Esis]. Lark gives me a root element of a tree, which I can navigate myself. Some people want to pass groves to the application, but I'm not sure of the status of those developments. P. > be nice if everything a client sent was an XML document. So, for > the sake of clarity and consistency, I can force the client to send: -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Tue May 6 21:33:10 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:57:47 2004 Subject: Strong Typing in SGML and XML In-Reply-To: <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca> (message from Tim Bray on Mon, 05 May 1997 17:09:21 -0700) Message-ID: <199705061929.PAA11039@exocomp.techno.com> > Date: Mon, 05 May 1997 17:09:21 -0700 > From: Tim Bray > > Ever since about 15 minutes after SGML was born, database people have been > discovering, to their surprise, that it contains no facilities for > strong data typing. You can have an element named , and > SGML will have no problem accepting > > purple bananas rule. It seems to me that this sort of data typing can already be accomplished: Using data attributes, and HyTime's Data Attributes For Elements (DAFE) facility (which acts as if the data content notation for an element were an architectural form), one could implement a scheme like the one you (Tim) propose: Or better yet: Of course, both of these require data attributes, which XML does not (yet!) support. -peter -- Peter Newcomb TechnoTeacher, Inc. 233 Spruce Avenue P.O. Box 23795 Rochester, NY 14611-4041 USA Rochester, New York 14692-3795 USA +1 716 464 8696 (home) +1 716 464 8696 (direct) +1 716 755 8698 (cell) +1 716 271 0796 (main) +1 716 529 4304 (fax) +1 716 271 0129 (fax) peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Tue May 6 22:19:42 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: > I think you have an assumption here that you know what software > will be processing your document at the other end. Yes. The markup is intended for private communication between an application and a speech synthesizer via an API. I wouldn't expect any kind of browser to be involved. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue May 6 23:10:24 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:47 2004 Subject: Why must an XML document contain an element? Message-ID: <6223@ursus.demon.co.uk> In message Eric Baatz - Sun Microsystems Labs BOS writes: > After I had send my message, I was staring at > > [23] document ::= Prolog element Misc* > [36] element ::= EmtpyElement | STag content ETag > [35] content ::= (element | PCData | ... | Comment)* > > when the light went on and I said "Oh, everything in one > element. Wish I hadn't sent that last message." I'm glad you did, because it raised some important issues. XML-DEV, in the tradition of SGML, welcomes contributions from thsoe who are exploring the language. (comp.text.sgml and XML-WG are littered with postings from me which I might have been better to suppress :-). It's very important that we get this traffic because it shows where the presentation of the language and its tools is deficient. If people find things hard to understand, then there is an onus on the documenters to make more effort. Anyway, it's been far too quiet on this list :-) We need to know how people are finding XML and what they want from it. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue May 6 23:47:15 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:47 2004 Subject: Strong Typing in SGML and XML Message-ID: <6228@ursus.demon.co.uk> In message <3.0.32.19970506102133.007a1bf0@pop.intergate.bc.ca> Tim Bray writes: > At 09:44 AM 5/6/97 GMT, Peter Murray-Rust wrote: > > >> 3. The syntax for dates and so on should match some ISO standard, > >> but I haven't found which one yet. > > > >Do you mean you there are several and you haven't decided between them? > >I thought that people had converged on a single one (I can't remember > >the number, it's something like 8601). > > I mean I spent half an hour poking around the Web and didn't come > up with anything right away. If someone will send me a pointer to > the standard syntax, I'll put it in the draft. ISO 8601. Being ISO it isn't on the WWW, but there is a very concise summary which I found at http://www.mcs.vuw.ac.nz/ - just look for ISO8601 in the search engine. It manages timezones within the date, and dates and times both absolute and relative. > > >I don't find SQLSIZE 'obvious' > > OK, all of the types but one need a single parameter; each parameter > is numeric, except for DATE, which is a boolean for timezone > existence. I didn't want to make up different attributes for each one. > Yes, it's hopelessly overloaded. Maybe it should just be called > XML-SQLPARAM. It is *not* the case that there is a single concept I much prefer this. OTOH some might require two? [...] > > Yes, this has to be supported. Somebody else pointed that out too. > > >Is equality defined/definable for floating point? > > Yes, because in the real world, there are no real numbers [sorry, > math joke] - what I mean is that floating point numbers exist either > as fixed-size binary objects in computer storage, or as strings of > digits, decimal points, and exponents, also in storage. Either > way, equality tests are meaningful. Given good implementations of > the IEEE rules, they are even useful. As always this has to be precisely specified. It should be clear whether a number in memory is being compared as its IEEE representation od something else. > > -Tim A general point about validation which I keep labouring and not making much headway is where does all this happen? It can happen at authoring, at parsing, or at the application. My concern is that unless this is defined it's likely to fall through the net. And having built strong typing into CML, it's not always trivial to implement (in fact I'm sure it's not correct in places). For example, should the system always hold a string value regardless of the original type? And if it converts back to a string representation presumably it should use the original string rather than reconvert. What happens with transformations is certainly not trivial, because it can involve precision and output format. P. > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 03:00:44 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:47 2004 Subject: PCDATA Message-ID: <6230@ursus.demon.co.uk> I am trying to interface JUMBO with NXP and Lark. I have bolted them both in, but get different answers (I think) for PCDATA on WF documents. How many PCDATA elements would be expected in the file? This is a variable And what would be their values? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Wed May 7 03:15:30 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:47 2004 Subject: PCDATA Message-ID: <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca> At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote: >How many PCDATA elements would be expected in the file? This is a variable Let's flatten that. Clearly there can't be any PCDATA before , so: \n\nThis is a variable\n\n 11 2222222222222222222222 33 Three pieces of PCDATA. Uh, I'll check Lark now... if it says anything else, that's a bug. -T xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eric_albright at sprynet.com Wed May 7 05:46:25 1997 From: eric_albright at sprynet.com (Eric Albright) Date: Mon Jun 7 16:57:47 2004 Subject: Strong Typing in SGML and XML Message-ID: <199705070346.UAA02957@m9.sprynet.com> First, I'd like to concur with the need for a formal specification for data typing. I had hoped that HyTime's lextype feature would be sufficient. I for one would like to hear from the HyTime experts about how they would implement the parallel data typing. -- No use reinventing any standard. It may only need simplifying and explaining. Having said that, I ask when is strong data typing necessary? As far as I can tell there is only one place where it is useful -- when the document is being created or altered. There will always be data validation that cannot be handled by data typing and as such must be delegated to a validating application or a human. e.g. AlbrightEric As for comments about the proposal: I would like to see a simplified version of the data types. It is very important for databases to know the exact size in bytes that a data element will occupy. SGML/XML deals with a character string and therefore does not care. More important to me are the constraints on the data implicit by a given type. I think we need to determine the types of constraints that each data type requires and allow for the maximum flexibility without sacrificing precision. As far as I can tell, there are three basic types--character, numeric, and temporal. Each type requires its own unique constraints: CHARACTER - an alphabet, length constraint, content constraint (regular expressions) NUMERIC - a maximum value, a minimum value, some type of rounding/precision TEMPORAL - a maximum value, minimum value, (the maximum and minimum values may be constrained in relation to the current value), some type of rounding/precision I think that the CHARACTER data type should be able to specify the alphabet and length constraint within the content constraint. However some modification to the standard regular expression writing would be necessary. I for one do not want to have to type \([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number. Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better. To allow maximum flexibility and precision for numeric values, we should be able to specify the form (roman/arabic) and a base. The rounding allows us to constrain the significant digits to some factor of the base. A rounding type would be needed for the greatest flexibility (round/ceiling/floor). Temporal values can specify either an instant of time or an extent of time. They should also be able to be rounded. When an instant is rounded, the significant digits are to the left; when an extent is rounded, the significant digits are to the right. To signify that an instant is precise to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To signify that an extent is precise to the nearest tenth of a second, it would be rounded by 0000/00/00 00:00:00.1 . Given this the "architectural form" for data typing would become: This changes the number of attributes from 4 to 9 but provides for higher precision for data constraint. The examples would become: For a bank loan; balance, interest rate, and maturity date: For an airline departure: passenger name, seat number, and departure time: Well, what do you think? Eric xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 08:22:02 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:47 2004 Subject: PCDATA Message-ID: <6256@ursus.demon.co.uk> In message <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca> Tim Bray writes: > At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote: > >How many PCDATA elements would be expected in the file? > > > > > This is a variable > > > > Let's flatten that. Clearly there can't be any PCDATA before , so: > > \n\nThis is a variable\n\n > 11 2222222222222222222222 33 > > Three pieces of PCDATA. Uh, I'll check Lark now... if it says anything > else, that's a bug. -T No bug. And Michael SMcQ gave the same answer. I am not sure what NXP gives at the moment, I'll have to check. So *I*, and most of the people who will be using CML, have a potentially serious problem and I don't know what to do. Ancillary Question: If this had been run through a validating parser and the DTD had contained I assume the above document would be invalid? (#PCDATA does not occur in the CML content model). But am I not right in thinking that in SGML the 'additional' newlines are discarded? If I run this document through sgmls with the above document, doesn't it validate? (I'm doing this from memory, so please be gentle). And at the same time throw away the 'spurious' #PCDATA elements? Problem 1. For a DTD which makes a restricted use of PCDATA, most documents are going to have lines of hundreds or thousands of characters long. The lines above would have to be: This is a variable and this could easily - in some of my applications - be very much longer. This makes such documents tricky to edit by hand and could cause problems with some text processing software. Problem 2. It is going to be almost impossible to educate an HMTL2XML community that the two documents above are different. I have only just realised this problem today, although I seem to remember in earlier versions of the spec the behaviour was different? So I now haven't the slightest idea what I should be doing - and I thought this was all solved... Problem 3. This seems to imply that a WF document *produces different output* if it is validated against a DTD. I accept this is true for SGML, but is it also true for XML? If so, I think we shall have an awful problem educating people. You will appreciate that I may have clung onto ideas which were parts of earlier versions of the draft. I'd be very grateful for an 'extremely simple' explanation of what happens with various input of the type above. If it's what I think, then at the very least I think that the current draft needs to address this more directly. Personally I would like some sort of XML-based switch that allowed a simple behaviour and allowed newlines for formatting. The spec says that DEFAULT says the the *application's* default white-space processing modes (why plural?) are acceptable. Is the application a DTD or a program? If the latter, then we are potentially going to have serious problems. If the former, then I don't see how the information is conveyed from the DTD to the program, if the program is generic (like JUMBO). P. (somewhat confused :-). -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Wed May 7 08:22:35 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:57:47 2004 Subject: PCDATA References: <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca> Message-ID: <33709E46.157D@edu.uni-klu.ac.at> Tim Bray wrote: > > At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote: > >How many PCDATA elements would be expected in the file? > > > > > This is a variable > > > > Let's flatten that. Clearly there can't be any PCDATA before , so: > > \n\nThis is a variable\n\n > 11 2222222222222222222222 33 > > Three pieces of PCDATA. Uh, I'll check Lark now... if it says anything > else, that's a bug. -T I do agree with Tim. I will also check with NXP tonight, to make sure that this is the answer I get. -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 09:37:18 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <6259@ursus.demon.co.uk> In message <199705070346.UAA02957@m9.sprynet.com> "Eric Albright" writes: > First, I'd like to concur with the need for a formal specification for data > typing. > > I had hoped that HyTime's lextype feature would be sufficient. I for one > would like to hear from the HyTime experts about how they would implement > the parallel data typing. -- No use reinventing any standard. It may only > need simplifying and explaining. > > Having said that, I ask when is strong data typing necessary? As far as I > can tell there is only one place where it is useful -- when the document is > being created or altered. There will always be data validation that cannot You may all regard this as poor design, but CML requires the documents to carry the data types. To save increasingly complex content models, CML has only two elements to carry typed data, XVAR (a scalar) and ARRAY (to carry large amounts of XVARs - an ARRAY looks like 1.2 2.3 3.4 - remember that some arrays can run to several powers of 10). At present CML uses 4 types (others are obsolete): STRING, FLOAT, INTEGER, DATE. I agree that in principle I can convert to , and so on, but it makes things more complex (and the current processing software has to be rewritten. However, if we are working towards re-usable components and the whole of the XML community says they like (say) 4 unique types, then in the interests of interoperability I would be shouting for that. If they prefer to type their variables by attribute, I'll shout for that. Neither is trivial to process. > be handled by data typing and as such must be delegated to a validating > application or a human. e.g. > AlbrightEric > > As for comments about the proposal: > > I would like to see a simplified version of the data types. It is very > important for databases to know the exact size in bytes that a data element > will occupy. SGML/XML deals with a character string and therefore does not > care. More important to me are the constraints on the data implicit by a > given type. I think we need to determine the types of constraints that each > data type requires and allow for the maximum flexibility without > sacrificing precision. I understand the force of your argument. For both your requirment and mine, the question is 'should XML support this, or is it up to the "application"?'. Personally I am in favour of XML steering people towards a common way of doing things, whether it be in the spec, or Generally Accepted Conventions. > > As far as I can tell, there are three basic types--character, numeric, and > temporal. Each type requires its own unique constraints: > > CHARACTER - an alphabet, length constraint, content constraint (regular > expressions) > > NUMERIC - a maximum value, a minimum value, some type of rounding/precision Some people will feel that the INTEGER/FLOAT distinction is important. I think I can live without it. > > TEMPORAL - a maximum value, minimum value, (the maximum and minimum values > may be constrained in relation to the current value), some type of > rounding/precision > > I think that the CHARACTER data type should be able to specify the alphabet > and length constraint within the content constraint. However some Again I keep asking the XML community the question as to where these constraints are applied. Editor (obviously), parser(??), application (presumably?). > modification to the standard regular expression writing would be necessary. > I for one do not want to have to type > \([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number. > Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better. > > To allow maximum flexibility and precision for numeric values, we should be > able to specify the form (roman/arabic) and a base. The rounding allows us > to constrain the significant digits to some factor of the base. A rounding > type would be needed for the greatest flexibility (round/ceiling/floor). > > Temporal values can specify either an instant of time or an extent of time. > They should also be able to be rounded. When an instant is rounded, the > significant digits are to the left; when an extent is rounded, the > significant digits are to the right. To signify that an instant is precise > to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To > signify that an extent is precise to the nearest tenth of a second, it > would be rounded by 0000/00/00 00:00:00.1 . I assume this must be a frequently solved problem and we shouldn't try to reinvent it. I someone more knowledgeable than me says - 'use the FOO approach' I'll probably buy it if it's stable and implementable. [...] > > > XML-TYPE-CONTENTCDATA#FIXED"[A-Z](*20)" > -- up to 20 repetitions of There has been a regular and repeated cry for regular expressions. If someone comes up with one that is available, I'll buy it. Surely one of the very many readers of this list is authoritative about this? This is a very critical discussion for me, and I expect for others and shows some of the new things that XML will be used for. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From geirog at falch.no Wed May 7 10:14:27 1997 From: geirog at falch.no (Geir Ove Gronmo) Date: Mon Jun 7 16:57:48 2004 Subject: PCDATA Message-ID: <3.0.1.32.19970507101433.0068fed4@falch.no> At 02:13 07.05.97 +0100, Tim Bray wrote: >At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote: >>How many PCDATA elements would be expected in the file? > > > > >This is a variable > > > >Let's flatten that. Clearly there can't be any PCDATA before , so: > >\n\nThis is a variable\n\n > 11 2222222222222222222222 33 > >Three pieces of PCDATA. Should this also be true if the XML-SPACE attribute is set to DEFAULT for the CML element? What would be the result of the following cases? Should there also be three pieces of PCDATA, since PCDATA can be an empty string? ([16]?PCData::= [^<&]*) \nThis is a variable\n And what about this one: This is probably some pretty stupid questions, but ... :') ------------------ Geir Ove Gr?nmo ------------------ Falch Infotek as, Stanseveien 21, 0902 Oslo, Norway Phone: +47 22 90 27 36 Fax: +47 22 90 25 99 [grove@falch.no | http://www.falch.no/people/geirog] ------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From chaotic at maths.tcd.ie Wed May 7 11:41:54 1997 From: chaotic at maths.tcd.ie (Alan Spencer) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML In-Reply-To: Your message of "Wed, 07 May 1997 07:26:06 GMT." <6259@ursus.demon.co.uk> Message-ID: <9705071041.aa05447@salmon.maths.tcd.ie> In message <6259@ursus.demon.co.uk>Peter writes: > > To allow maximum flexibility and precision for numeric values, we should be > > able to specify the form (roman/arabic) and a base. The rounding allows us > > to constrain the significant digits to some factor of the base. A rounding > > type would be needed for the greatest flexibility (round/ceiling/floor). > > > > Temporal values can specify either an instant of time or an extent of time. > > They should also be able to be rounded. When an instant is rounded, the > > significant digits are to the left; when an extent is rounded, the > > significant digits are to the right. To signify that an instant is precise > > to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To > > signify that an extent is precise to the nearest tenth of a second, it > > would be rounded by 0000/00/00 00:00:00.1 . > > I assume this must be a frequently solved problem and we shouldn't try to > reinvent it. I someone more knowledgeable than me says - 'use the FOO > approach' I'll probably buy it if it's stable and implementable. > > [...] > > > > > > > XML-TYPE-CONTENTCDATA#FIXED"[A-Z](*20)" > > -- up to 20 repetitions of > > There has been a regular and repeated cry for regular expressions. If > someone comes up with one that is available, I'll buy it. Surely one of the > very many readers of this list is authoritative about this? Hi, I'm certainly not an authority on regular expressions, but I have been using the one in perl for many years now and I find it meets all of my requirements. It can be a bit messy (but aren't all regular experssions!). I'm sure most of you know how it works, it is quite like the one outlined above. It may be too complicated for what is necessary, as I'm sure that is a goal here, to make things as simple as possible and only as complicated as necessary. The ideas may need to be changed a bit, but the underlying structure is definitely there, the 'telephone number' example would be similar to that suggested. What is the plan as regards things not matching the constraints, I presume it is just a strict error, ie. not a valid XML document. Is there any plans to give a flexability to the rules, as to make corrupted data, for example, parseable, as is the case with HTML, most browsers are fairly smart when it comes to 'guessing'. This *is* a bad thing most of the time in HTML, as it promotes guess-work on the part of the inexperienced author. I have experienced this with co-workers using WYSIWYG editors - 'It looks good on my computer, what's wrong with yours'. So I suggest this very lightly, I don't want to promote that. As regards to the strong typing, could there be generic types which a particular application/Style would define, or even go undefined throughout. There are applications which work with arbitrary percision calcuations (like calc on UNIX), this would need a generic *real* type. For example, I have an interest in Mathematical formatting, simillar to that done by LaTeX, but with a more structured approach, ie. these documents could be parsed as formatte text or as real mathematical equations/functions/... For example, in TeX the code: "x^{ijk}_{lmn}" will produce: ijk x lmn This doesn't define what this *means*, just what it looks like, it could be powers/indecies.... So if I was to try to define a generic variable *x* and add the functionality to it, it would make sense. If I am actually making sense, myself, any input on this would be helpfull. As far as I see it, if there were generic types, or maybe *a* generic type, people could extend the basic types using styles to add the necessary functionality to these types. I'm not if I am starting to tend towards a type of programming language, but what the hell. Thanks, Alan Spencer. > > This is a very critical discussion for me, and I expect for others and shows > some of the new things that XML will be used for. > > P. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 12:40:44 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <6268@ursus.demon.co.uk> Hi Alan, Thanks very much for your contribution.It raises several points. In message <9705071041.aa05447@salmon.maths.tcd.ie> Alan Spencer writes: > In message <6259@ursus.demon.co.uk>Peter writes: [...] > > Hi, > I'm certainly not an authority on regular expressions, but I have been using > the one in perl for many years now and I find it meets all of my requirements. > It can be a bit messy (but aren't all regular experssions!). I'm > sure most of you know how it works, it is quite like the one outlined above. > It may be too complicated for what is necessary, as I'm sure that is a goal > here, to make things as simple as possible and only as complicated as necessary. Tim Bray (ERB) has been looking for a RE tools for XML. The point is (I think, Tim) that they're not trivial to write and that it's critical that everyone uses the same one. So we don't want to build into XML a RE that isn't easily available. If someone says, 'here's one in Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd make progress. > > The ideas may need to be changed a bit, but the underlying structure is > definitely there, the 'telephone number' example would be similar to that > suggested. > > What is the plan as regards things not matching the constraints, I presume > it is just a strict error, ie. not a valid XML document. Is there any plans > to give a flexability to the rules, as to make corrupted data, for example, > parseable, as is the case with HTML, most browsers are fairly smart when > it comes to 'guessing'. This *is* a bad thing most of the time in HTML, as > it promotes guess-work on the part of the inexperienced author. I have > experienced this with co-workers using WYSIWYG editors - 'It looks good > on my computer, what's wrong with yours'. So I suggest this very lightly, > I don't want to promote that. ***ERB*** This matter has been discussed at very great length on the WG and the ERB is closing in on a position. ERB, I think it could be very useful to cross post your position here (or modify it appropriately). ***XML-DEV*** The treatment of errors is an extremely important issue, but it will not be profitable to discuss it till the ERB has pronounced. I would also ask XML-DEV to accept that the ERB position has required much midnight oil and to try not to repeat the discussions on XML-WG. > > As regards to the strong typing, could there be generic types which a particular > application/Style would define, or even go undefined throughout. There > are applications which work with arbitrary percision calcuations (like calc > on UNIX), this would need a generic *real* type. For example, > I have an interest in Mathematical formatting, simillar to that done by LaTeX, > but with a more structured approach, ie. these documents could be parsed as > formatte text or as real mathematical equations/functions/... > For example, in TeX the code: "x^{ijk}_{lmn}" will produce: > ijk > x > lmn > > This doesn't define what this *means*, just what it looks like, it could be > powers/indecies.... So if I was to try to define a generic variable *x* and > add the functionality to it, it would make sense. > If I am actually making sense, myself, any input on this would be helpfull. You are! I have been asking for some tie for 'parsable math' - i.e. something that can be input to a machine, rather than being typeset for a human. I accept that math is a wide spectrum and covers everything from research maths papers to teaching 3 year-olds. I doubt that a single DTD will cover this. The W3C group on math will report on may 15 (HTML-MATH). This will be XML-compatible. the group is aware of the need for interoperability with other DTDs and the need for 'parsable' math. I believe they will cross post this list. > As far as I see it, if there were generic types, or maybe *a* generic type, > people could extend the basic types using styles to add the necessary > functionality to these types. > > I'm not if I am starting to tend towards a type of programming language, but > what the hell. I think it's very important to make sure that none of us re-invent the wheel. I like Tim's idea of mining SQL for elements, not because I like SQL (I don't much) because lots of people have thought hard about it. For the same reason I have suggseted that dates be ISO8601 compatible, because the authors of that have thought of most of the problems. Similarly *if* any of the math groups working on DTDs come up with recommmendations we should treat them very seriously. If there is (and there must be) a statndard for string representations of the types described here, let's use that. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Guy.Teasdale at bibl.ulaval.ca Wed May 7 15:26:16 1997 From: Guy.Teasdale at bibl.ulaval.ca (Guy Teasdale) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <3.0.32.19970507082452.006b6100@hermes.ulaval.ca> >> >> 3. The syntax for dates and so on should match some ISO standard, >> >> but I haven't found which one yet. >> > >> >Do you mean you there are several and you haven't decided between them? >> >I thought that people had converged on a single one (I can't remember >> >the number, it's something like 8601). >> >> I mean I spent half an hour poking around the Web and didn't come >> up with anything right away. If someone will send me a pointer to >> the standard syntax, I'll put it in the draft. > >ISO 8601. Being ISO it isn't on the WWW, but there is a very concise >summary which I found at http://www.mcs.vuw.ac.nz/ - just look >for ISO8601 in the search engine. You will find an article on this standard at: http://www.ft.uni-erlangen.de/~mskuhn/iso-time.html "A Summary of the International Standard Date and Time Notation" by Markus Kuhn With other links at the end of this article. The link mentionned to Gary Houston text doesn't work, try this one: http://www.mcs.vuw.ac.nz/comp/Technical/SGML/doc/iso8601/ISO8601.html The official source is in the ISO catalogue at: http://www.iso.ch/cate/cat.html ISO 8601:1988 Data elements and interchange formats -- Information interchange -- Representation of dates and times Edition: 1 (monolingual) Number of pages: 14 Price code: G ICS: 01.140.30 Descriptors: calendar dates, data representation, documentation, hours (time), information interchange Technical Corrigendum 1:1991 to ISO 8601:1988 Number of pages: 1 Last updated on 1997-05-03 Guy Teasdale t?l: (418) 656-2131 - 2090 Biblioth?que de l'Universit? Laval fax: (418) 656-7897 Sainte-Foy, Qu?bec G1K 7P4 Guy.Teasdale@bibl.ulaval.ca xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 15:44:46 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: PCDATA Message-ID: <6276@ursus.demon.co.uk> In message <199705071303.JAA12106@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes: [...] > > I would suggest that you application look to see of the PCDATA contains > only a single \n, and if so, toss it. Thanks for encouraging me on this. I think it will be such a common occurrence that it should be in the XML-lang spec. I will raise my head over the parapet again... > > >Problem 3. > >This seems to imply that a WF document *produces different output* if it is > >validated against a DTD. I accept this is true for SGML, but is it also > >true for XML? If so, I think we shall have an awful problem educating > >people. > > Yes. This is why I said we should keep *all* PCDATA; at least application > will always know what to expect. RE delenda est (David Durand's and my idea) > also get's around this problem nicely in a slightly different way. > > I am somewhat dissatisfied with this apect of XML, but can live with it. ^^^^^^^^^^^^^^^^^ I think this is true for anyone brought up in the tradition of SGML. It's much tougher for a webhacker. It's not easy to realise that: ... inserts TWO separate newlines in the parser output from a WF document. It fooled me. Most people would assume there weren't any. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Wed May 7 16:22:22 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:57:48 2004 Subject: PCDATA In-Reply-To: <6256@ursus.demon.co.uk> Message-ID: On Wed, 7 May 1997, Peter Murray-Rust wrote: > > >How many PCDATA elements would be expected in the file? > > > > > > > > > > This is a variable > > > > I was running NXP with : A variable and the result was : " " " A variable " " " (\n is passed along to the application since the parser dosn't know what else to do with it.) I also used the example with a simple DTD : ]> A variable and the result was : " A variable " -> the whitespace inside CML was recognized to be markup only. Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 16:27:02 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <6280@ursus.demon.co.uk> In message <199705071326.JAA12115@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes: > >Tim Bray (ERB) has been looking for a RE tools for XML. The point is (I > >think, Tim) that they're not trivial to write and that it's critical that > >everyone uses the same one. So we don't want to build into XML a RE that > >isn't easily available. If someone says, 'here's one in > >Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd > >make progress. > > RE processors are easy to implement, and there are a great number of > them available for free. There are a number of specifications that > could be used: I would recommend something like the POSIX ones, > suitably extended. Good. Where is a volunteer to crack up a Java one? > > I really do prefer the notation method though. It's much cleaner, and > only a little more complex to implement. I'm not arguing against the notation method, though to my limited eyes it seems to need a revised draft? The suggestion of regular expressions was simply that *if* we got one for TEI pointers (as has been urged) we can use the same one for this. But if notations are harder that RE's then you will probably have to look to someone else to implement it. BTW - I am surprised that after nearly 3 months of this list there aren't more people coming up with tools. A lot of the stuff must already exist... > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 16:46:17 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: PCDATA Message-ID: <6289@ursus.demon.co.uk> In message Norbert Mikula writes: > On Wed, 7 May 1997, Peter Murray-Rust wrote: > > > > >How many PCDATA elements would be expected in the file? > > > > > > > > > > > > > > > This is a variable > > > > > > > > I was running NXP with : [...examples deleted...] Norbert's answers agree with what I got and also with the consensus of the group. It's clear that WF files can give *different* data from those with some or all of the ELEMENT declarations. I do not find the behaviour intuitive and believe we have to address it in some manner. I am sympathetic to trashing the whitespace PCDATA elements, but there is no clear idea of how. An application like:


may wish the result to have 3 newlines as children (i.e. 5 elements in all). But equally an app may be frustrated by the extra elements. It's easy to ask for the TEI pointer "DESCENDANT(1,PRE)CHILD(1,*)" and expect to get the dot1. This can be criticised as bad style but it's as likely to arise from ignorance rather than sloppiness. There has rightly been concern about the conformance of parsers (esp. their reaction to errors). This is an area where I suspect conformance is non-trivial. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed May 7 17:14:40 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML In-Reply-To: <199705070346.UAA02957@m9.sprynet.com> Message-ID: In message <199705070346.UAA02957@m9.sprynet.com>, Eric Albright writes > >Having said that, I ask when is strong data typing necessary? As far as I >can tell there is only one place where it is useful -- when the document is >being created or altered. There will always be data validation that cannot >be handled by data typing and as such must be delegated to a validating >application or a human. e.g. >AlbrightEric >From a museum perspective, we have found the need for two types of data validation/strong typing, which we call 'syntax control' and 'vocabulary control'. Syntax control deals with things like the form of personal names. These are _not_ analysed in our application, but expressed in a consistent way suitable for alphabetical sorting, e.g.: Light, Richard B. rather than Richard B. Light The syntax check would pick up non-capitalised words (apart from a 'stop list' of known weak prefixes), inconsistent use of full stop and/or spaces after initials, etc. This starts to be hard work for a regular expression, and might more easily be supported as a 'notation', for which an external helper applet is called up in the context of editing. Vocabulary control involves checking the data content against an external authority, which could be a simple termlist or a complex thesaurus. Another use we make of data syntax is as a short-cut for markup. (This was before we knew about SGML, by the way! The conventions were originally devised to make optimal use of A5 catalogue cards ...) We use colons as a 'field separator', e.g.: maker : Light, R.B. implies: maker Light, R.B. and ampersands (definitely pre-SGML!) as keyword separators: Burgess Hill & W. Sussex & U.K. implies: Burgess Hill W. Sussex U.K. These practices tie in with the SGML concept of short references, which are not available in XML. So a general conclusion I have come to is that ':' and '&' need to be mapped to suitable subelements, and our users need to come to terms with more heavily tagged records than they are used to. This is relevant (really!) in the context of Tim's suggestion that strong typing should apply only to PCDATA-only elements. In the more general case of 'data validation' we might well want to validate elements with substructure. Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 17:40:44 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: PCDATA Message-ID: <6298@ursus.demon.co.uk> Thanks Gavin, In message <199705071512.LAA12189@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes: > >Norbert's answers agree with what I got and also with the consensus > >of the group. It's clear that WF files can give *different* data from > >those with some or all of the ELEMENT declarations. I do not find the > >behaviour intuitive and believe we have to address it in some manner. > > Agreed. I believe that RE delenda est solves the problems. I am not sure that I was on board for this discussion (I have been told that whitespace occupied a large amount of bytes last year :-) A summary could be useful - it clearly has a good pedigree. Is it a language or an implementation issue? > > >I am sympathetic to trashing the whitespace PCDATA elements, but there is > >no clear idea of how. > > The SGML rules are not always intuitive either.... > > >There has rightly been concern about the conformance of parsers (esp. their > >reaction to errors). This is an area where I suspect conformance is > >non-trivial. > > Validation of parsers should *certainly* extend to grove construction > as well as error handling. > Yes. For those not on the WG, Jon has informed us that the likely major implementors are keen on conformance , so this must surely be an early issue. It suggests that we shall need some test data and while this already exists (torture) I am not sure that the outputs have been rigorously investigated. Of course there is more than one type of output, and when I compare NXP's output to Lark's I am comparing an Esis stream to a tree of Elements (but not a complete grove). The discussion here and elsewhere makes it very clear that the *parser* is a fundamental unit and that wherever possible it should be self-contained and independent of the 'application'. That makes it even more important for us to specify an API. Please correct this, but I see three possible outputs from a parser: - a grove - an esis_stream - a tree of elements, possibly with PIs, attached to nodes. We ought to be able to give outputs for each of these so that implementers can check. What concerns me at present is that some of the functions (e.g. XML-SPACE) may vary with parsers and that this could be extremely difficult to pin down in a monolithic application. I'd recommend that what ever of the methods above is used, it should be possible to tap into them. It's also clear that applications must recognise certain *attributes*. At present these seem to be: XML-SPACE XML-LINK ROLE HREF TITLE SHOW ACTUATE BEHAVIOR Because most of these are non-trivial (e.g. XML-SPACE extends to its children, so they have to be stamped with it, but when editing a tree the attribute may need to disappear from relocated children). XML-LINK is quite complex and affects content of elements (XML-LINK="EXTENDED"). Is there a case for, and is it possible to have, a PRE-application module that deals with attributes and other generic stuff. This would also help people to converge on a single interpretation. I's feel much happier about telling a pre-application with carefully argued semantics what to do with whitespace or link structure validation than trusting to any old application. P. > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 7 22:41:48 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <6311@ursus.demon.co.uk> In message altheim writes: > Peter@ursus.demon.co.uk (Peter Murray-Rust) writes: > > In message <199705071326.JAA12115@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes: [...] > > > > > > RE processors are easy to implement, and there are a great number of > > > them available for free. There are a number of specifications that > > > could be used: I would recommend something like the POSIX ones, > > > suitably extended. > > > > Good. Where is a volunteer to crack up a Java one? > > Well, after reading Jeffrey Friedl's book "Mastering Regular Expressions" > (O'Reilly), I would heavily caution everyone to make sure we advocate and > develop to a *single* RE specification, as it seems very evident that there I thought this was taken for granted - that there would be a single RE in all XML-l[ai]n[gk] specifications. I also assumed (naively?) that POSIX defined such an RE, and we merely needed an implementation. There might well be subsidiary questions such as 'do we want to implement a subset', 'are there any clashes between RE syntax and XML syntax', 'are PEs expanded before evaluating the RE :-)', etc. > is such a variance between the RE processors in perl, Tcl, sed, awk, vi, etc. > that having RE inconsistencies among XML applications would be worse than > having no RE support at all. Fully agreed. > > If we choose a code base that contains more RE features than the minimal > set supported by all RE processors, we need to be clear which features > are part of and required by XML. (This sounds like a mess to me.) Since XML-LINK-TEI has shrunk since its first airing, I suspect that there is a desire not to overreach. Certainly we should not force implementors to have to work hard to comply to unnecessary features. (For all I know some REs would be sufficiently powerful to act as XML parsers :-) P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Thu May 8 03:48:24 1997 From: murata at apsdc.ksp.fujixerox.co.jp (Murata Makoto) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML In-Reply-To: <6268@ursus.demon.co.uk> Message-ID: <9705080103.AA00060@lute.apsdc.ksp.fujixerox.co.jp> Peter Murray-Rust writes: >Tim Bray (ERB) has been looking for a RE tools for XML. The point is (I >think, Tim) that they're not trivial to write and that it's critical that >everyone uses the same one. So we don't want to build into XML a RE that >isn't easily available. If someone says, 'here's one in >Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd >make progress. There is a great tool called Grail, developed by Darrell Raymond and Derick Wood. Grail is available from the URL as below: http://www.csd.uwo.ca/research/grail/grail.html One of the advantages of Grail is that you can modify the syntax of regular expressions. However, Grail is not free (see below). Makoto Fuji Xerox Information Systems Tel: 044-812-7230 Fax: 044-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp ------------------------------------------------------------------------- Is Grail free? No, Grail is not free. We don't charge scholars, students, or researchers for the use of Grail, and we don't charge people who simply want to play with it to satisfy their own curiosity. But no commercial use of Grail is permitted without our prior, express, written consent. No part of Grail may be included in a commercial product or used on a commercial problem without our prior, express, written consent. It's not that we have something against people making money---we just want to make sure that those who benefit financially from using Grail put some small part of that benefit back into the development and support of Grail. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu May 8 04:11:34 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:48 2004 Subject: Strong Typing in SGML and XML Message-ID: <3.0.32.19970507190920.00a02470@pop.intergate.bc.ca> At 11:20 AM 5/7/97 GMT, Peter Murray-Rust wrote: >Tim Bray (ERB) has been looking for a RE tools for XML. The point is (I >think, Tim) that they're not trivial to write and that it's critical that >everyone uses the same one. So we don't want to build into XML a RE that >isn't easily available. If someone says, 'here's one in >Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd >make progress. Not quite... actually, there are several excellent free RE tools around. But I don't know of any good RE tools, free or commercial, that support Unicode. 16-bit as opposed to 8-bit characters do change the scope of the problem. It's not just regexp, we have some tricky work to do if we want to do *anything* inside an element, e.g. token or even character counting, in the internationalized environment. - Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From flammia at sls.lcs.mit.edu Thu May 8 04:18:36 1997 From: flammia at sls.lcs.mit.edu (Giovanni Flammia) Date: Mon Jun 7 16:57:48 2004 Subject: regular expressions Java classes Message-ID: <199705080218.WAA04866@maritimus.lcs.mit.edu> A non-text attachment was scrubbed... Name: not available Type: text Size: 684 bytes Desc: not available Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970508/578fc732/attachment.bat From murata at apsdc.ksp.fujixerox.co.jp Thu May 8 04:26:52 1997 From: murata at apsdc.ksp.fujixerox.co.jp (Murata Makoto) Date: Mon Jun 7 16:57:49 2004 Subject: Strong Typing in SGML and XML In-Reply-To: <3.0.32.19970507190920.00a02470@pop.intergate.bc.ca> Message-ID: <9705080227.AA00062@lute.apsdc.ksp.fujixerox.co.jp> Tim Bray writes: > But I don't know of any good RE tools, free or commercial, >that support Unicode. 16-bit as opposed to 8-bit characters do >change the scope of the problem. It's not just regexp, we have >some tricky work to do if we want to do *anything* inside an >element, e.g. token or even character counting, in the >internationalized environment. - Tim Grail is type-parameterized. So, you can create a regular expression over whatever classes. Makoto Fuji Xerox Information Systems Tel: 044-812-7230 Fax: 044-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu May 8 17:21:48 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:49 2004 Subject: Strong Typing in SGML and XML Message-ID: <3.0.32.19970508081643.00a02540@pop.intergate.bc.ca> At 09:26 AM 5/8/97 -0400, Gavin Nicol wrote: >Speaking of which. Do both Lark and NXP handle 16 bit characters? Yes. It's hard not to, in Java. Mind you, Lark at the moment just reads 8-bit characters from a file (through a method that you could subclass). -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From andrewl at microsoft.com Thu May 8 19:25:21 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:57:49 2004 Subject: Strong Typing in SGML and XML Message-ID: <7BB61B44F197D011892800805FD4F79291C579@RED-03-MSG.dns.microsoft.com> Several people have written in recently asking, in effect, "What are the purposes of this strong typing? What needs does it solve?" So I asked around. Here are the needs that have been advanced: 1. Storage optimization. Various clients want to be able to optimize storage by keeping numbers in a binary format, strings in a preallocated structure, etc. 2. Implied semantics. E.g. numbers can be added together, if you know they are numbers. Also, knowing that a number is meant to have a fixed versus floating precision affects how operations are performed, what kind of precision is retained during calculations, what errors are reported, etc. 3. Parsing and formatting rules. Dates are expected to be in some standard representation, such as given by ISO 8601 (e.g.). Floating point numbers permit scientific notation. Etc. 4. Different data types need different supplementary attributes, such as number of digits precision, total size in characters, whether time zones are present, etc. (In Tim's proposal, these all overload a single generic attribute.) 5. Range restrictions. Dates and other kinds of things measured in numbers can be limited to a range of values. All types can be potentially limited to a set of descrete values (by enumeration or rule). xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From andrewl at microsoft.com Thu May 8 20:08:06 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:57:49 2004 Subject: Strong Typing in SGML and XML Message-ID: <7BB61B44F197D011892800805FD4F7927DD92E@RED-03-MSG.dns.microsoft.com> I accidently sent this message before completing it. The full message is here: > Several people have written in recently asking, in effect, "What are > the > purposes of this strong typing? What needs does it solve?" So I > asked > around. Here are the needs that have been cited: > > 1. Storage optimization. Various clients want to be able to > optimize storage by keeping numbers in a binary format, strings in a > preallocated structure, etc. > > 2. Implied semantics. E.g. numbers can be added together, if you > know they are numbers. Also, knowing that a number is meant to have a > fixed versus floating precision affects how operations are performed, > what kind of precision is retained during calculations, what errors > are > reported, etc. Knowing that a string is meant to be a > URL gives hints on its use. Etc. > > 3. Parsing and formatting rules. Dates are expected to be in some > standard representation, such as given by ISO 8601 (e.g.). Floating > point numbers permit scientific notation. Etc. For example, > though the number "0.1234E+20" could have been represented as > "123420", and the date > "19970508T10:47" could have been similarly broken into year, month, > etc., > and this markup would eliminate the need for special parsers for > numbers and > dates, it has obvious readability and bloat problems. > An explicit data type can signal what the internal elements are and > how > to parse for them without tags. > > 4. Different data types need different supplementary attributes, > such as number of digits precision, total size in characters, whether > time zones are present, etc. (In Tim's proposal, these all overload a > single, generic attribute.) > > 5. Range restrictions. Dates and other kinds of things measured in > numbers can be limited to a range of values. All types can be > potentially limited to a set of descrete values (by enumeration or > rule). For example, an attribute expressing a color > in terms of wavelength could be limited to 400..700 (nanometers). An > attribute listing a US egg size could be limited to be among > "medium," "large," "extra large" and "jumbo." If represented > numerically, > it could be limited to "1," "2," "3" and "4." > > 6. Passthrough. Sometimes XML is a carrier syntax between systems > where some participants need to convey implications not covered in > points > 1 through 5 above. For example, a database may make a distinction > between CHAR and VARCHAR even though other readers of a document > don't. > > > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From bosak at atlantic-83.Eng.Sun.COM Thu May 8 23:33:56 1997 From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood In-Reply-To: (message from David Pawson on Tue, 06 May 1997 11:20:36 +0000) Message-ID: <199705082132.OAA11351@boethius.eng.sun.com> [Dave Dawson:] | From a single source document, marked up in XML, we | need to produce 4 output transforms, braille, large print, html | and typeset. This seems clear enough. | Additionally, we want (for local use) to be able to 'create' | 'document type' (our own definition). Now I'm not quite sure what you're saying. | Question: Should we be using the doctype as the switch, | or an input to the output processing application (perhaps as | a command line option). An input to the output processing application. | Our definition on document type goes something along the | lines of (for one particular use) - an editors note, a report, | a memo. [Seems logical to talk about document type in this | way]. Right. Note, report, and memo are three different types of documents. They can be described using three different document type definitions (DTDs) that list the tags and attributes to be used with each type and specify structural rules for how they can be used. This is a fairly heavy SGML concept that is specified in sections 2.9, 3.2, and 3.3 of the xml-lang draft but not in a way that anyone would be expected to understand without quite a bit more explanation. You don't need to specify a DTD in XML, and if you don't, you can omit the DOCTYPE line and just use an XML header. Since you are trying to coordinate the efforts of multiple authors, you will eventually have to learn about DTDs, because they are the primary tool for organizing projects that use XML tagging for large-scale publishing. So one of these days you should get a book on designing SGML DTDs and check out some of the principles. You will find that many of the less frequently used DTD constructs have been left out of XML to make it easier to implement, so you will probably find the job of learning to construct XML DTDs from an SGML book rather frustrating, but that's life for early implementors. By this time next year we should have some good books on this subject; hang in there. In the meantime, you should NOT use doctypes as a way to switch between output formats from the same source file. The source file is conceptually one document type regardless of the output format. Since we're only about halfway into the larger XML effort, you presently have a standard way to specify syntax (Part 1, xml-lang) and a very preliminary standard way to specify hypertext linking (Part 2, xml-link), but no standard way yet to specify style (that will be Part 3, xml-style) or other output processing. Some people on this list are working on Java APIs for XML that would provide one avenue of standardization, and maybe one of them will jump in at this point and explain more. If you want to get a preview of the style language and see some very simple XML DTDs, check out these examples: http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/10_mail/10_mail.zip http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/20_tstmt/20_tstmt.zip http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/21_shaks/21_shaks.zip (or .tar.gz). Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri May 9 08:58:33 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood In-Reply-To: <199705082132.OAA11351@boethius.eng.sun.com> Message-ID: In message <199705082132.OAA11351@boethius.eng.sun.com>, Jon Bosak writes > >In the meantime, you should NOT use doctypes as a way to switch >between output formats from the same source file. The source file is >conceptually one document type regardless of the output format. I've been thinking about the issue of what comes at the head of an XML document. This may be stating the obvious, but ... While it would be generally agreed that you can't gratuitously stick any old Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri May 9 10:39:14 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood Message-ID: <6392@ursus.demon.co.uk> In message Richard Light writes: [...] > > I've been thinking about the issue of what comes at the head of an XML > document. This may be stating the obvious, but ... > > While it would be generally agreed that you can't gratuitously stick any > old case for architecting XML so that you _can_ hold the naked XML without > _any_ header information, and prepend both DOCTYPE and style processing > instructions at delivery time. > > One reason is that you might want to author a document in chunks, and > either publish/work with the chunks in their own right, or put those > chunks together via a 'master document' containing lots of entity > references to pull the chunks in. For the first purpose, the free- > standing chunks will require a DOCTYPE header, not least so you can > create them in a structured XML-aware editor. For the second purpose, > they need to be 'naked', since you can't pull in an entity with a > DOCTYPE at the beginning, and we don't have the SMGL SUBDOC facility in > XML. This is a problem I have come up against, and still concerns me. I would like to encourage authors to create documents in small reusable chunks, the question being whether we use a construction like: ... etc... ]> ... &chunk1; with the chunks (say) being: ... or whether we use something like ]> with mini1.cml being: ... Now, I wrote this latter on the fly, and it looks horribly clunky and it's much more difficult to implement. And is it *legal*? and will it do what I want? The advantage is that the mini version can be used in its own right and we know what language it's in. Chunks like: Foo Bar do not carry their DTD and also unwanted whitespace could easily creep in. Constructions like: FooBar might solve some, but not all of the whitespace problem. Since this must be a Well Investigated Problem, insight would be useful. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From DPawson at rnib.org.uk Fri May 9 12:16:05 1997 From: DPawson at rnib.org.uk (David Pawson) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood Message-ID: One line of my original message read something like 'What was the original intent' of DOCTYPE? I love the idea of partitioning big docs to work on little ones. This must be a good idea in any development. Was there nothing in the thinking of the original geniuses who started all this off? Or was it simply, this is the first line of the spec, lets call it .... #include works for me as a lower mortal, but it won't permit me to compile an include file unless I draw up an empty doc with the necessary gubbins in, then #include the same file, simply to permit compilation. Will the same mechanism work for XML, i.e. #include sub-file Sounds simple enough to do what I might want to do. Come on gurus, what was it all about in the first place? It wasn't that long ago that you have forgotten ... was it! Regards, DaveP xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Boris.Moore at wanadoo.fr Fri May 9 18:48:41 1997 From: Boris.Moore at wanadoo.fr (Boris Moore) Date: Mon Jun 7 16:57:49 2004 Subject: XML and HTML browsers Message-ID: <01BC5CA9.129BCC80@yellow-ami-145.wanadoo.fr> On Sunday, 4 May 1997, Peter Murray-Rust wrote: >>I would like to re-use *existing* browser functionality rather >>than continuing to extend the *generic* aspects of a browser in JUMBO. >>I'm interested in exploring the general question of how a specialist >>Java application interacts with a Java-enabled HTML browser. I cannot reply to the Java and JavaScript aspects of your questions, but I am struck by how closely your description of hoped for interaction between Jumbo and the built-in HTML rendering of the browser relates to work we at RivCom have doing on developing a Netscape Plug-in for XML. The big difference, (which is why this is not a direct response to your questions), is that we are working in C++, not Java, and we are at the moment catering to Java _disabled_ browsers, and are therefore denying ourselves the use of JavaScript! Our plug-in, of which a prototype was demonstrated at the WWW6 XML demo session, takes an XML input stream, together with style-sheet data, and processes it, to generate different HTML streams for different Netscape instances, or different frames within Netscape. The user can click on hotspots or buttons, which send messages to the plug-in. This can result, for example, in modified style settings for one or more instances of one or more element types. (This can include contextual search criteria for the targeted elements). The plug-in then sends the resulting modified HTML to Netscape for display. I anticipate that the plug-in will at a later point be split into two components. Firstly, the plug-in dll itself, which will handle only the interfacing with Netscape, including much of the kind of interaction that you describe, plus a bit more. And secondly a component which does all the rest, including processing the XML and style-sheet data. The second component could then potentially be replaced by other modules, which would interface with the plug-in dll's API in order to use the Netscape HTML rendering functionality, and receive appropriate callbacks from user input. Such a module could be written in Java. (Though we have opted for C++, partly for performance reasons). ------------------------------------- Boris Moore Software Development Boris.Moore@Wanadoo.fr ------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From bosak at atlantic-83.Eng.Sun.COM Fri May 9 20:02:36 1997 From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood In-Reply-To: (message from Richard Light on Fri, 9 May 1997 07:49:34 +0100) Message-ID: <199705091800.LAA11983@boethius.eng.sun.com> | Another reason is that you might have slightly variant DTDs for the | same conceptual document type, and a production process whereby the | documents start of conforming to say an author-friendly DTD, and then | progress to conform to a stricter 'delivery' DTD. Again, this can | only happen if you can switch in a DTD at document load time. Sure, there could be lots of good reasons to use variant doctypes. My only point was that switching output formats isn't one of them. | However, the reason I started along this line of thought was based | around the much more comfortable area of output formats, i.e. style | sheets. We certainly need an easy way to prepend instructions to bind | a style sheet to a document at delivery time, so that its style is not | bound into the DTD declaration. A processing instruction 'up front' | would be the obvious way to do this: | | | This method, proposed by James Clark in this list about a month ago, is very close to what the ERB seems to be converging on as the way to do simple stylesheet linking. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 10 00:32:37 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:49 2004 Subject: DOCTYPE misunderstood Message-ID: <6413@ursus.demon.co.uk> In message David Pawson writes: [...] > > Will the same mechanism work for XML, > i.e. > > > #include sub-file > > Hi, This was about the first question I asked on comp.text.sgml 2-3 years ago, so it's nice to be in a position to answer it. It's not intuitive to someone brought up on C - you keep looking for the #include in SGML (XML) and it isn't there. Instead there is a mechanism involving *entities*. In simple terms, parameter entities (which use the %foo; notation) are mainly involved in DTDs, and general entities (like &bar;) are used to include chunks of data. You will find them in the draft in 4.2 , 4.3 (and their treatment in 4.4). In your example you might have something like: ]> &chunk1; &chunk2; This is OK so long as the contents of the files are well formed. If the document is to be valid, then the complete document after inclusion of the chunks has to be valid. If the chunks contain entities, then those have to be (recursively) expanded. Make sure the entities have been declared. However, XML says that for WF documents, the processor need not expand entities. (4.4; 8). I'm not sure whether the entity still has to be declared in this case. General point: Example files of this sort of thing would be very valuable. P. > > Sounds simple enough to do what I might want to do. > > Come on gurus, what was it all about in the first place? > It wasn't that long ago that you have forgotten ... was it! > > Regards, DaveP > > > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 10 00:32:46 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:50 2004 Subject: XML and HTML browsers Message-ID: <6414@ursus.demon.co.uk> In message <01BC5CA9.129BCC80@yellow-ami-145.wanadoo.fr> Boris Moore writes: > On Sunday, 4 May 1997, Peter Murray-Rust wrote: > > >>I would like to re-use *existing* browser functionality rather > >>than continuing to extend the *generic* aspects of a browser in JUMBO. > >>I'm interested in exploring the general question of how a specialist > >>Java application interacts with a Java-enabled HTML browser. > > I cannot reply to the Java and JavaScript aspects of your questions, but > I am struck by how closely your description of hoped for interaction > between Jumbo and the built-in HTML rendering of the browser relates to > work we at RivCom have doing on developing a Netscape Plug-in for XML. Don't worry about the language aspect - a C++ plugin sounds a very useful way forward. The main disadvantage is that it is platform-specific, but of course its performance is better. Java is probably a better way to develop methods over the WWW. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon May 12 17:58:01 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:50 2004 Subject: DOCTYPE misunderstood In-Reply-To: <199705091222.HAA169162@tigger.cc.uic.edu> Message-ID: In message <199705091222.HAA169162@tigger.cc.uic.edu>, C M Sperberg- McQueen writes >On Fri, 9 May 1997 07:49:34 +0100, Richard Light wrote: > >>While it would be generally agreed that you can't gratuitously stick any >>old >case for architecting XML so that you _can_ hold the naked XML without >>_any_ header information, and prepend both DOCTYPE and style processing >>instructions at delivery time. > >I think there is a case for saying XML has in fact been so designed, >and that what you want to do is already possible. > >Am I missing something? No, it's me. I think I've been muddled by the RMD rules. It says that "If no RMD is provided, an XML processor must behave as though an RMD had been provided with value ALL". So I was thinking that a "naked chunk" would require an RMD of: in order that the XML processor knows that it "can parse the containing document correctly without reading any part of the DTD". But presumably this is "parsing" in the sense of reading a _valid_ instance rather than a _well-formed_ one? Richard Light. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From bosak at atlantic-83.Eng.Sun.COM Mon May 12 18:03:22 1997 From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:50 2004 Subject: DOCTYPE misunderstood In-Reply-To: (message from David Pawson on Fri, 09 May 1997 11:10:16 +0000) Message-ID: <199705121600.JAA13415@boethius.eng.sun.com> [Dave Pawson:] | One line of my original message read something like | 'What was the original intent' of DOCTYPE? | [...] | Come on gurus, what was it all about in the first place? | It wasn't that long ago that you have forgotten ... was it! Well, back before 1986, anyway. That's when the SGML standard was published. "Doctype" means "type of document." Novels, telephone books, poems, plays, bills of lading, and patient care records are types of documents. They have different structures and need differently named tags if the tags are going to make sense to a human user or an intelligent indexer. You must start reading some of the easily available materials on this subject before asking others for information. You can start with the XML page at W3C (go to http://www.w3.org and click on "XML", then follow all the pointers from that page). This reading will include the XML FAQ and Robin Cover's magnificent SGML web site. Then you can hit a couple of the good beginning books on SGML. I happen to be at an SGML conference in Barcelona at the moment; I will take a look at currently available introductory SGML books that might work well for an XML newbie and reply back here with a list (probably a very short one). Some people are working on putting up a general-purpose public XML mailing list that should become available in the next few weeks. Until then, I suggest that you post inquiries of the variety "What is a doctype?" to the newsgroup comp.text.sgml and not to this xml-dev list, which is for the use of technical experts engaged in the construction of XML software. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri May 16 01:59:51 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:50 2004 Subject: SGML97 Europe Message-ID: <6588@ursus.demon.co.uk> I've just come back from SGML97 in Barcelona and thought some personal comments might be useful. Firstly I was delighted to meet SGML geekdom in the flesh :-) It's marvellous finding so many people who were hitherto virtual. I came away with a strong feeling of community, and many thanks to everyone who made me welcome. Without question, XML was the major theme at the meeting, though associated areas such as DSSSL were also causing a lot of interest. The ERB has done a fantastic job in getting this done so efficiently, quickly, and also making sure that the world knew how important this was. It's now very clear that the major WWW-related companies are taking a very active role in exploring the potential of XML, and a recent posting to XML-WG has confirmed Netscape's interest. [The XML-spec printing was sponsored by SUN and Microsoft.] I think most people who are not used to virtual working will underestimate how much time the ERB has spent on the process. As Eliot Kimber said in his closing address, they had to choose between having a family life or an XML-life. There were literally hundreds of mails a week. NOTE: all mail on the XML-WG gets read very thoroughly by the ERB - even if it doesn't get formally answered at the time. It's only necessary to make a point once. The ERB supplemented e-mail with weekly conference phone calls, and this is how decisions were taken. Quite apart from XML itself, I personally commend the efficiency of the ERB's virtual process and shall try to abstract from it those aspects which make it successful. Clear initial guidelines help, and a wider community which is well versed in abiding by a standard drawn up under legal guidance :-) Other points. XML clearly fills many different roles for different people. It's clear that people who sell complex SGML applications see different benefits (and some concerns) from those who see XML as the next step from HTML. Taking too narrow a view might sometimes cause unnecessary conflicts. Eliot described XML's position vis-a-vis SGML as low-cost/low-benefit versus high-cost/high-benefit and stressed the need for the additional components such as DSSSL, architectural forms, etc. (Personally I would put XML as lowish-cost/medium-benefit :-) It's important not to argue HTML vs XML or XML vs SGML as such arguments are often meaningless or based on limited views. Both DTD-less and DTD-full applications will benefit from XML. The *use* of XML falls in a spectrum with fuzzy borderlines. It's clear that DSSSL has a great deal of impetus and the only question is whether the ERB can work fast enough for everyone else's expectations. There many other problems surfacing. How does XML interact with HTML? (are there XML plug-ins, should XML DTDs contain subsets of HTML, etc.) Strong typing, and APIs (both areas that XML-DEV could work on). And perhaps most excitingly for some of us the concept of Information Objects which was mentioned in several talks. My understanding of information objects (which has been designed into CML) is that documents will frequently contain 'chunks' from several different sources. For example chemistry papers frequently contain maths; but there is no formal syntax for combining two different DTDs in the same document (watch this space...). IMO a robust DTD-less XML document will most likely be an aggregation of well-defined information objects (i.e. individually parsable against a DTD), but where every document would be likely to have a differing formal DTD. It's clear that XML is able to greatly widen the market for *ML. Since *ML will increasingly be co-existing with Object technologies it's important that the applications are well designed and interoperate cleanly. One great benefit of *ML is that people who deal with documents can understand the power of *ML, whilst they might well 'switch off' when confronted with objects. Implementation is also very rapid - many companies have - or shortly will have - XML implementations. This will have high benefits - let's also push for low prices :-). It's becoming even more important that this group helps to create reference sites with test data, DTDs, etc. so that these tools can be evaluated. Once again, a lovely time. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri May 16 13:26:36 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:50 2004 Subject: Mathematical Markup Language (MathML) Message-ID: <6629@ursus.demon.co.uk> [To xml-dev, crossposted to CLIC and CHEMIME and Patrick Ion, AMS. Patrick, please feel free to circulate this to HTML-Math-WG. This posting is being addressed to both XML-DEV and the Chemical Informatics community, so please excuse any confusions :-).] The first draft of MathML was published on May 15th and is enormously exciting. It is written to be compatible with XML and to evolve as that spec evolves, so that we have one of the very first DTDs that has been developed in that way. Since math is common to a very large number of vertical markets in the ScientificTechnicalMedical market (and many others) MathML will highlight how domain-specific DTDs and documents can be re-used in a variety of contexts. The draft (long, impressive, and in several sections) is at: http://www.w3.org/pub/WWW/TR/WD-math/ Could the authors provide a tar.gz or similar file so that all the sections including the gifs can be downloaded? Also could the appendices have names unique under 8.3 format. TIA :-) Note that the Math-WG has addressed the two key aspects of encoding maths - presentation (cf. TeX) and content (cf. symbolic algebra, plotting packages, etc.). The current document addresses both of these and having had very useful discussions with Patrick Ion, Martin and Roy Pike and Steve Buswell I'm very confident that MathML will cater for a wide range of chemical requirements. Certainly I hope to explore its use for plotting graphs, extracting functions, using symbolic variables within chemical discourse and tables, and much more. In principle it should be possible to (say) extract a set of force-field equations from a molecular mechanics paper and directly manipulate them into a computer program. The publication of MathML coincides with the XML-ERB's request for discussion on multiple namespaces, and the very large emphasis at SGML97 on Information Objects. The XML community is now clearly working as fast as possible to develop the spec so that an XML document can be composed of a variety of fragments/subdocuments/objects - various names are chosen. Chemical Markup Language is being developed along some of the same lines as MathML - to be XML-compatible, to use common semantics where appropriate, and to avoid namespace collisions. Whatever syntactic mechanism is chosen it will allow subcomponents of a document to be linked to the appropriate DTD - very probably distributed over the WWW - and for appropriate semantics and behaviour to be applied. The math proposal has enormous implications for technical publications and documents. In CML a space has been deliberately left called 'MATH' and now can be replaced by 'MathML', e.g. as: %mathml.dtd; and in a document instance something like:

The function relating...

... x2 would now insert the expression x^2 at the appropropriate part of the text. Note that this use of XML-LINK avoids the complexity of tailoring the parent DTD to accommodate infinitely variable content models. Note that the document suggested above *can be validated* if required, since all the DTDs involved are included. Of course this is only possible because CML and Math have no namespace collisions, and this mechanism will not be generally applicable. (MathML has some very short GIs (have I got the terminology right this time? :-) and is therefore likely to collide with an arbitrarily chosen DTD). I'd welcome comments from the chemical community on this and suggest that posting them to chemime@ic.ac.uk would be the most appropriate (please *don't* post directly to the math group!). On the more general question of interoperability of information objects we may need to wait a week or two to see how the XML-WG discussions take that, but I'm very keen to start trying to get an implementation of math in CML. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From gfrer at luna.nl Fri May 16 20:41:50 1997 From: gfrer at luna.nl (Gerard Freriks) Date: Mon Jun 7 16:57:50 2004 Subject: Mathematical Markup Language (MathML) Message-ID: What will be the consequences when efforts like these will be transplanted to other sectors like Medicine? E.g. HL7-SGML http://www.mcis.duke.edu/standards/HL7/committees/sgml/ Greetings Gerard Freriks, MD << start of forwarded material >> Date: Fri, 16 May 1997 11:44:05 GMT From: Peter@ursus.demon.co.uk (Peter Murray-Rust) To: xml-dev@ic.ac.uk Cc: clic@ic.ac.uk, chemime@ic.ac.uk, ion@math.ams.org Subject: Mathematical Markup Language (MathML) Lines: 89 Sender: owner-xml-dev@ic.ac.uk Precedence: bulk Reply-To: Peter@ursus.demon.co.uk (Peter Murray-Rust) [To xml-dev, crossposted to CLIC and CHEMIME and Patrick Ion, AMS. Patrick, please feel free to circulate this to HTML-Math-WG. This posting is being addressed to both XML-DEV and the Chemical Informatics community, so please excuse any confusions :-).] The first draft of MathML was published on May 15th and is enormously exciting. It is written to be compatible with XML and to evolve as that spec evolves, so that we have one of the very first DTDs that has been developed in that way. Since math is common to a very large number of vertical markets in the ScientificTechnicalMedical market (and many others) MathML will highlight how domain-specific DTDs and documents can be re-used in a variety of contexts. The draft (long, impressive, and in several sections) is at: http://www.w3.org/pub/WWW/TR/WD-math/ Could the authors provide a tar.gz or similar file so that all the sections including the gifs can be downloaded? Also could the appendices have names unique under 8.3 format. TIA :-) Note that the Math-WG has addressed the two key aspects of encoding maths - presentation (cf. TeX) and content (cf. symbolic algebra, plotting packages, etc.). The current document addresses both of these and having had very useful discussions with Patrick Ion, Martin and Roy Pike and Steve Buswell I'm very confident that MathML will cater for a wide range of chemical requirements. Certainly I hope to explore its use for plotting graphs, extracting functions, using symbolic variables within chemical discourse and tables, and much more. In principle it should be possible to (say) extract a set of force-field equations from a molecular mechanics paper and directly manipulate them into a computer program. The publication of MathML coincides with the XML-ERB's request for discussion on multiple namespaces, and the very large emphasis at SGML97 on Information Objects. The XML community is now clearly working as fast as possible to develop the spec so that an XML document can be composed of a variety of fragments/subdocuments/objects - various names are chosen. Chemical Markup Language is being developed along some of the same lines as MathML - to be XML-compatible, to use common semantics where appropriate, and to avoid namespace collisions. Whatever syntactic mechanism is chosen it will allow subcomponents of a document to be linked to the appropriate DTD - very probably distributed over the WWW - and for appropriate semantics and behaviour to be applied. The math proposal has enormous implications for technical publications and documents. In CML a space has been deliberately left called 'MATH' and now can be replaced by 'MathML', e.g. as: %mathml.dtd; and in a document instance something like:

The function relating...

... x2 would now insert the expression x^2 at the appropropriate part of the text. Note that this use of XML-LINK avoids the complexity of tailoring the parent DTD to accommodate infinitely variable content models. Note that the document suggested above *can be validated* if required, since all the DTDs involved are included. Of course this is only possible because CML and Math have no namespace collisions, and this mechanism will not be generally applicable. (MathML has some very short GIs (have I got the terminology right this time? :-) and is therefore likely to collide with an arbitrarily chosen DTD). I'd welcome comments from the chemical community on this and suggest that posting them to chemime@ic.ac.uk would be the most appropriate (please *don't* post directly to the math group!). On the more general question of interoperability of information objects we may need to wait a week or two to see how the XML-WG discussions take that, but I'm very keen to start trying to get an implementation of math in CML. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) << end of forwarded material >> Gerard Freriks,huisarts, MD C. Sterrenburgstr 54 3151JG Hoek van Holland the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249 Mobile : (+31) (0)6-54792800 ARS LONGA, VITA BREVIS xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 17 21:05:53 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:50 2004 Subject: MathML (and implications for XML) Message-ID: <6766@ursus.demon.co.uk> I have read quickly through the MathML (970515) draft and have some (hopefully constructive) comments to make - any crossmember of xml-dev and html-math-wg is welcome to crosspost them. Before giving detailed comments, I must say that I think it's an extremely useful document and covers all of the areas that I - as a mathematically oriented scientist - would like to see. The initial discussion is very useful and I shall borrow some of the flavour of it when redrafting Chemical Markup Language. An archetypal XML DTD --------------------- Since MathML is one of the very first XML DTDs to be published it naturally sets a style which others may imitate. In general I think it does this well, though it is at the mercy of a still fluid XML-lang and XML-link spec. I appreciate that some of this was probably written some time before the latest XML drafts. Specific comments in this area are: 3.1.4 'By default, XML processors remove all leading and trailing whitespace ... between the begin/end tags and collapse any internal w/s to a single space character'. My current understanding is that *validating* parsing removes the start and end w/s but does not collapse the internal w/s, but that WF-parsing passes the whole lot unchanged included the leading/trailing w/s. [I'm usually wrong on this, but it's a problem area :-)]. 7.1 ['' can terminate it). This might allow significant simplification to MathML 7.1 and allow the elimination of two sets of tags. MathML proposes two generic means of extending functionality, one through attributes and the other through macros. 7.2.3 the OTHER attribute has the syntax: OTHER="name1='val1 name2='val2'..." and essentially allows a means of adding additional attributes independently of the DTD. Personally I'm sympathetic to this (as long as the attributes are ones *I*'ve though of :-). This is 'not to encourage software developers to use this as a loophole for circumventing the MathML core markup'... but as we all know this is the sort of unchecked semantics that people love and which soon leads to non-interoperable documents and processors. I'd be frightened of it in the Chemical community. This is a point which is important for XML in general. 5.3 Macros. This is the ability to create macros to avoid repetition of verbose markup and seems particularly appropriate to math. (I think it has a similar, but smaller, role in chemistry.) As far as I can see it is totally compatible with XML/SGML, ***BUT it requires a pre-processor*** (I have been calling this a pre-parser). There will be a role for a pre-parser in XML and one of its functions will be to apply macros. Can we work towards a standard set of operations that a pre-parser might carry out? XML-LINK. The document is written with little reference to XML-link (not surprising, since it's new and AFAIK JUMBO is the only tool that implements it even at prototype level). However I think there are at least the following areas where XML-link mechanism might be alternatives: 7.1 Display and in-line notations. The draft assumes that the MATH component of a document is embedded in the HTML at the point that it occurs in natural reading. XML-LINK gives a mechanism for separating the math and the text and combining them under the flexibility of the linking mechanism. The problem occurs in exactly the same way in chemistry - do we encode HCl in-line or as a display; HCl This is a matter of style which may not be totally within the author's control - the publisher or renderer or reader may have the power to alter it. Since XML will approach this generically at the LINK level, I have used constructs like:

this is hydrogen chloride...

... Cl This - in the present JUMBO - will in-line the formula for HCl. I am sure that by use of stylesheets and BEHAVIOUR it would be possible to control your equations to be at the para end, etc. 7.2.4 . I am sure that it is possible to recast this tag in terms of XML-LINK BEHAVIOR. That saves a lot of hassle writing code because it may already have been done...at least in part. Communality with future XML DTDs -------------------------------- As XML develops, CML gets smaller. This is wonderful. There are a number of general components of MathML that will help CML and probably other people as well. A particular example is VECTOR and MATRIX (4.2.9). It is clear from the XML-WG that many people want a method of representing (multidimensional) regular arrays of strongly typed data and also the means for addressing into these. Some (including me) will try to push for economy of expression and avoid the syntax. (At present CML uses the following matrix syntax: %mathml; ]> and then use MathML tags. This is more luck than good planning :-), but CML has been careful to restrict its tagset. Linking between variables ------------------------- If I write: x = y + 3 (I) and later 2x = y + 4 (II) I would 'normally' deduce that x = 1 (III) and y = -2 (IV) However, there is nothing in MathML AFAICS that allows one to specify that the 'x' in (I) is the same x as in (II). [Please forgive me if I've missed this]. For many applications we need to label a variable or function as having the same value and semantics throughout a document, e.g. 'Determination of the velocity of light'. In this example I would point to some central target which represented a the variable 'c', though I'm not clear how MathML would manage this in equations. This is a very important requirement for re-usable scientific publications, though perhaps ambitious at this stage. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jimg at digitalthink.com Wed May 21 18:27:12 1997 From: jimg at digitalthink.com (Jim Gindling) Date: Mon Jun 7 16:57:50 2004 Subject: Good XML-Relevant SGML Books for Beginners? Message-ID: Hi all, I have been reading like crazy on the web, and have a fair understanding of the basic XML concepts. However, I am still puzzled as to exactly how I can accomplish desirable tasks such as: + Converting XML documents to HTML (preferably HTML that uses CSS, and preferably using something other than DSSSL, which seems overly complex). + Referencing dynamic data within XML documents, that is presumably stored in a database, such as student name, quiz scores, et cetera. I don't expect anybody to answer these questions directly since I understand that is not the focus of this list; however, I would really appreciate some guidance in picking one or two good books that will answer my questions. Using Amazon.com, I have found the following books that seem relevant. If somebody could give me their thoughts on these, or others, I would be very grateful. Abcd...Sgml : A User's Guide to Structured Information Liora Alschuler Industrial-Strength Sgml: An Introduction to Enterprise Publishing (Charles F. Goldfarb Series on Open Information Management) Truly Donovan The Sgml Implementation Guide: A Blueprint for Sgml Migration Brian E. Travis, Dale C. Waldt Sgml on the Web: Small Steps Beyond H.T.M.L. (Charles F. Goldfarb Series on Open Information Management) Yuri Rubinsky, Murray Maloney Others? Thanks in advance. Jim Gindling DigitalThink Software Engineer xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 21 19:32:01 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:50 2004 Subject: Good XML-Relevant SGML Books for Beginners? Message-ID: <7018@ursus.demon.co.uk> In message Jim Gindling writes: Thanks for positing, Jim - xml-dev has been a bit sleepy recently. > Hi all, > > I have been reading like crazy on the web, and have a fair understanding of > the basic XML concepts. However, I am still puzzled as to exactly how I > can accomplish desirable tasks such as: > > + Converting XML documents to HTML (preferably HTML that uses CSS, and > preferably using something other than DSSSL, which seems overly complex). If your sole exposure to DSSSL has been the postscript description (~300 pages) I can sympathise. However, there is a shortened version (DSSSL-O) and there will soon be examples of how to tweak existing DSSSL documents. (Jon Bosak has shown how to do this and you'll find his stuff under www.sil.org/sgml in the DSSSL section. So get a DSSSL engine - Jade or YADE and run the examples. There is general agreement that DSSSL is the only real way forward for significant work and there are free implementations of engines. Remember of course that XML documents don't have to make textual sense and that to format "1cccccc1" or x2 you will need application-specific software. So, in general, converting XML to HTML depends very much on the XML application. > + Referencing dynamic data within XML documents, that is presumably stored > in a database, such as student name, quiz scores, et cetera. I expect that we shall see XML2SQL/QL2XML applications very shortly. There is a lot of discussion on XML-WG about how to transport data rather than text and there is a *proposal* from Tim Bray to have strongly-typed data in XML (e.g. FLOAT, DATE). Having said that, XML doesn't write the applications for you - it provides a mechanism to hold information. > > I don't expect anybody to answer these questions directly since I > understand that is not the focus of this list; however, I would really ^^^^^^^^^^^^^^^^^^ That is correct! But it's a slackish period. > appreciate some guidance in picking one or two good books that will answer > my questions. There will doubtless be XML books fairly shortly. ***Note that the spec is still a draft and will remain so for some months***. It will change, without doubt, as bugs are thrown up. The timescale for the first frozen release is late autumn sometime - any more precise dates anyone? My own feeling is that 'Learn XML in 21 days/48 hours/without tears' will present the syntax of the language, but won't reveal the full power of the language. It's really only by playing with it, talking to SGML geeks (an honourable term), and tackling real problems that you really get fluent. That's because managing information is a very rich subject. > Using Amazon.com, I have found the following books that seem relevant. If > somebody could give me their thoughts on these, or others, I would be very > grateful. I won't comment on the books mentioned - my impression is that there are about a dozen specialist SGML books in common use - but be aware that XML deliberately does not use a large number of SGML features. WHAT WE REALLY NEED ON THIS LIST ARE SOME REAL EXAMPLES OF XML DTDs and DOCs. I don't think we have soak tested the parsers yet - I have been converting my DTDs today and think that I've found 2 (minor) bugs in a parser. Let's have some announcements of converted DTDs, ENTITY sets, documents, etc. Without that it's much harder to learn the language. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eldred at tiac.net Wed May 21 19:48:54 1997 From: eldred at tiac.net (Eric Eldred) Date: Mon Jun 7 16:57:50 2004 Subject: Good XML-Relevant SGML Books for Beginners? Message-ID: <3.0.32.19970521134802.007bf370@tiac.net> I recommend this book: Sgml on the Web: Small Steps Beyond H.T.M.L. (Charles F. Goldfarb Series on Open Information Management) Yuri Rubinsky, Murray Maloney It was written before XML, and really doesn't get into a lot of the area where XML will be used. But it does show what SGML can do beyond HTML. And it stimulates and provokes us to take advantage of SGML's (or XML's) power to publish in new ways for users that HTML can't. It also comes with the full Panorama Pro browser, so you can experiment immediately with writing and reading SGML, in the interactive way that XML will be providing. Plus, it is just a really nice, humane book that is easy enough for SGML beginners to understand, where so many of the other SGML books are far too technical for newbies. -- "Eric" Eric Eldred mailto:eldred@tiac.net no fax tel:+1 603 434 7746 x1 USPS:50 E Derry Rd #21, E Derry NH USA 03041-0021 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From spinosaj at scripps.edu Thu May 22 06:18:14 1997 From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD) Date: Mon Jun 7 16:57:50 2004 Subject: Meeting Anouncement for HL7-SGML Mixer Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com> I am posting this on behalf of Liora Alschuler who is out of the country at present. This anouncement has been cross posted to several listservs as well as comp.text.sgml. ************ ANNOUNCEMENT *************** HL7 SGML Mixer: Medical Claims Processing with SGML ************ July 24 -- 25 ************** We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims Processing with SGML," a two-day seminar to take place July 24-25 in the San Diego/La Jolla area. The event is cosponsored by: * GCARI (Graphic Communications Association Research Institute; http://www.gca.org) * HL7 SGML SIG (Health Level 7, SGML Special Interest Group; http://www.mcis.duke.edu/standards/HL7/committees/sgml/) * SGMLOpen (http://www.sgmlopen.org) The event is open to all participants, regardless of affiliation, for a modest fee to cover our costs. AGENDA: Showcase Tools, Respond to Federal RFP The two-day session has a double agenda: 1) Introduce SGML-based tools and technology to developers and users of healthcare information systems. 2) Address the manner in which HL7 can use SGML-based tools and technology to respond to an RFP being issued by the US Health Care Financing Authority (HCFA) regarding electronic submission and processing of Medicare and Medicaid claims. The HCFA RFP is the first to be issued in compliance with the requirements of the Health Insurance Portability and Accountability Act (HIPAA, known as Kennedy-Kassebaum) which mandates creation and use of standardized electronic medical records. It is the intent of HL7, the parent organization of the HL7 SGML SIG, to respond to the Federal RFP in conjunction with one of the major Medicare and Medicaid providers. FORMAT: Presentations, Tabletops, Working Sessions The first day will start with presentations focusing on potential use of SGML-based solutions in healthcare in general with a focus on the HCFA scenario. After a kick-off session, two tracks may run concurrently, one directed at management issues and a second at technical solutions. The second portion of the day will provide a venue for table-top technology demonstrations. Presentations will be chosen by a peer review process with members of all three organizations participating. The table-top demonstrations are open to all for a small fee. The second day will consist of HL7 working sessions to begin formulating our response to the Federal requirements. These sessions are open to all, with the caveat that active participants should be well versed in the work and objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe and to become active.) MARK YOUR CALENDARS: The Mixer RFP, detailing the requirements set out by the Federal government, will be made available through these same channels the week of June 2. Submissions for presentations will be due June 23. Notification of acceptance will be sent out by July 7. FOR MORE INFORMATION: The full Mixer RFP with details on the HCFA scenario will be sent out through these same channels no later than June 6 and will be posted on the Web sites of the sponsor organizations. You may also contact the organizations cosponsoring the event or: Liora Alschuler, HL7 SGML Mixer Program Chair mixer@the-word-electric.com or 802/785-2623 ===================================================== John Spinosa, MD, PhD spinosaj@scripps.edu xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From spinosaj at scripps.edu Thu May 22 06:18:14 1997 From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD) Date: Mon Jun 7 16:57:50 2004 Subject: Meeting Anouncement for HL7-SGML Mixer Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com> I am posting this on behalf of Liora Alschuler who is out of the country at present. This anouncement has been cross posted to several listservs as well as comp.text.sgml. ************ ANNOUNCEMENT *************** HL7 SGML Mixer: Medical Claims Processing with SGML ************ July 24 -- 25 ************** We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims Processing with SGML," a two-day seminar to take place July 24-25 in the San Diego/La Jolla area. The event is cosponsored by: * GCARI (Graphic Communications Association Research Institute; http://www.gca.org) * HL7 SGML SIG (Health Level 7, SGML Special Interest Group; http://www.mcis.duke.edu/standards/HL7/committees/sgml/) * SGMLOpen (http://www.sgmlopen.org) The event is open to all participants, regardless of affiliation, for a modest fee to cover our costs. AGENDA: Showcase Tools, Respond to Federal RFP The two-day session has a double agenda: 1) Introduce SGML-based tools and technology to developers and users of healthcare information systems. 2) Address the manner in which HL7 can use SGML-based tools and technology to respond to an RFP being issued by the US Health Care Financing Authority (HCFA) regarding electronic submission and processing of Medicare and Medicaid claims. The HCFA RFP is the first to be issued in compliance with the requirements of the Health Insurance Portability and Accountability Act (HIPAA, known as Kennedy-Kassebaum) which mandates creation and use of standardized electronic medical records. It is the intent of HL7, the parent organization of the HL7 SGML SIG, to respond to the Federal RFP in conjunction with one of the major Medicare and Medicaid providers. FORMAT: Presentations, Tabletops, Working Sessions The first day will start with presentations focusing on potential use of SGML-based solutions in healthcare in general with a focus on the HCFA scenario. After a kick-off session, two tracks may run concurrently, one directed at management issues and a second at technical solutions. The second portion of the day will provide a venue for table-top technology demonstrations. Presentations will be chosen by a peer review process with members of all three organizations participating. The table-top demonstrations are open to all for a small fee. The second day will consist of HL7 working sessions to begin formulating our response to the Federal requirements. These sessions are open to all, with the caveat that active participants should be well versed in the work and objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe and to become active.) MARK YOUR CALENDARS: The Mixer RFP, detailing the requirements set out by the Federal government, will be made available through these same channels the week of June 2. Submissions for presentations will be due June 23. Notification of acceptance will be sent out by July 7. FOR MORE INFORMATION: The full Mixer RFP with details on the HCFA scenario will be sent out through these same channels no later than June 6 and will be posted on the Web sites of the sponsor organizations. You may also contact the organizations cosponsoring the event or: Liora Alschuler, HL7 SGML Mixer Program Chair mixer@the-word-electric.com or 802/785-2623 ===================================================== John Spinosa, MD, PhD spinosaj@scripps.edu xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From spinosaj at scripps.edu Thu May 22 06:18:14 1997 From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD) Date: Mon Jun 7 16:57:50 2004 Subject: Meeting Anouncement for HL7-SGML Mixer Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com> I am posting this on behalf of Liora Alschuler who is out of the country at present. This anouncement has been cross posted to several listservs as well as comp.text.sgml. ************ ANNOUNCEMENT *************** HL7 SGML Mixer: Medical Claims Processing with SGML ************ July 24 -- 25 ************** We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims Processing with SGML," a two-day seminar to take place July 24-25 in the San Diego/La Jolla area. The event is cosponsored by: * GCARI (Graphic Communications Association Research Institute; http://www.gca.org) * HL7 SGML SIG (Health Level 7, SGML Special Interest Group; http://www.mcis.duke.edu/standards/HL7/committees/sgml/) * SGMLOpen (http://www.sgmlopen.org) The event is open to all participants, regardless of affiliation, for a modest fee to cover our costs. AGENDA: Showcase Tools, Respond to Federal RFP The two-day session has a double agenda: 1) Introduce SGML-based tools and technology to developers and users of healthcare information systems. 2) Address the manner in which HL7 can use SGML-based tools and technology to respond to an RFP being issued by the US Health Care Financing Authority (HCFA) regarding electronic submission and processing of Medicare and Medicaid claims. The HCFA RFP is the first to be issued in compliance with the requirements of the Health Insurance Portability and Accountability Act (HIPAA, known as Kennedy-Kassebaum) which mandates creation and use of standardized electronic medical records. It is the intent of HL7, the parent organization of the HL7 SGML SIG, to respond to the Federal RFP in conjunction with one of the major Medicare and Medicaid providers. FORMAT: Presentations, Tabletops, Working Sessions The first day will start with presentations focusing on potential use of SGML-based solutions in healthcare in general with a focus on the HCFA scenario. After a kick-off session, two tracks may run concurrently, one directed at management issues and a second at technical solutions. The second portion of the day will provide a venue for table-top technology demonstrations. Presentations will be chosen by a peer review process with members of all three organizations participating. The table-top demonstrations are open to all for a small fee. The second day will consist of HL7 working sessions to begin formulating our response to the Federal requirements. These sessions are open to all, with the caveat that active participants should be well versed in the work and objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe and to become active.) MARK YOUR CALENDARS: The Mixer RFP, detailing the requirements set out by the Federal government, will be made available through these same channels the week of June 2. Submissions for presentations will be due June 23. Notification of acceptance will be sent out by July 7. FOR MORE INFORMATION: The full Mixer RFP with details on the HCFA scenario will be sent out through these same channels no later than June 6 and will be posted on the Web sites of the sponsor organizations. You may also contact the organizations cosponsoring the event or: Liora Alschuler, HL7 SGML Mixer Program Chair mixer@the-word-electric.com or 802/785-2623 ===================================================== John Spinosa, MD, PhD spinosaj@scripps.edu xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From amutel at ifhamy.insa-lyon.fr Thu May 22 09:09:32 1997 From: amutel at ifhamy.insa-lyon.fr (Alexandre Mutel) Date: Mon Jun 7 16:57:51 2004 Subject: XML & Entities inclusion against Inline Tag facilities. Message-ID: <199705220709.JAA28224@ifhamy.insa-lyon.fr> hello, In XML specs (like SGML features), they talk about entities inclusions in a document... Something like: ]> &including; Okay,they say that with XML-SGML a document can be built with document-part- included using entities facilities. HTML doesn't make use of external entities but it can do inline image through some tag... In XML specs i doesn't see any reference to TAG or special attri- butes that can handle inclusion of document component (text,image,object). I would like to know : - if in the future, we 'll only use external entities to include a document component ? - anyelse, does XML will support special attributes for Tag to specify that this Tag with this attributes can include something? - or does this feature will be hardcoded in a browser, making the same mistake than HTML? Thanks. Regards, Mutel Alexandre. email: amutel@ifhamy.insa-lyon.fr xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 22 12:10:17 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:51 2004 Subject: General chat (Meeting Anouncement for HL7-SGML Mixer) Message-ID: <7065@ursus.demon.co.uk> In message <3.0.32.19970521212222.007a0100@pop.mindspring.com> "John C. Spinosa, MD, PhD" writes: John, I don't know whether they have all gone to the list, but *I* got three copies of this announcement :-) More generally, this list is for XML developers and this post doesn't relate directly to XML. [I did have a long and useful chat with Liora at SGML97 and so I know that she and other HL7 people are very interested in XML, and of course she has done a first class job of publicising it. One immediate concern is the multiple namespace/DTD fragment/information object/ concern.] Jon Bosak mentioned two weeks ago that there were movements to create an XML interest group (?comp.text.xml?) and I'd be happy to see that. Presumably it would require the USENET voting process? ***** Since I'm posting to the list anyway, I have been investigating how to use NXP as a tool for turning JUMBO into a validating editor. I think I'm nearly there and will be posting back to Norbert what is required as an API. One thing that I need very clearly back from both Lark and NXP is an error flag - 'parse this, please', 'sorry, error'. The final 'result' will be a tree-structured editor rather than an event-stream driven tool (I don't think Java is fast enough to allow character-by-character validation of text processing). We haven't had any gossip about tools on this for a long time. I could use a nice simple non-validating XML editor - what have people got? And what has happened to all the enthusiasm for setting up MIRROR sites? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 22 12:10:31 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:51 2004 Subject: XML & Entities inclusion against Inline Tag facilities. Message-ID: <7067@ursus.demon.co.uk> In message <199705220709.JAA28224@ifhamy.insa-lyon.fr> Alexandre Mutel writes: This is an important subject as I am currently wrestling with the XML linking spec at present. I'd be happy to see a clear exposition of how XML includes/transcludes document/fragments, etc. Although XML-DEV is not intended as a forum for beginners, there are a number of questions - like the current one - which are legitimate to discuss if we don't have a lot of traffic. I also think it's easy for developers to misinterpret parts of the spec (I have done this in a major way and fairly publicly with XML-LINK I think :-). Also Since I am running a virtual course on XML and Java (see URL), it's useful to know what questions come up :-) > hello, > > In XML specs (like SGML features), they talk about entities inclusions in > a document... Something like: > > > > ]> > > &including; > This is indeed correct and PARSERs are required to implement it. For many applications it will simply be an insertion of the text in index.txt at the point of the entity reference. So if index.txt contained:

That's all folks!

the parser would create an intermediate instance:

That's all folks!

Note that if there is whitespace in the entity, this whitespace is included in the document. Also, if there are entity references in the entity, *these* are then processed. This facility only works for entities which are XML documents (but see NOTATIONS) They cannot have a DOCTYPE or subset and must correspond to a wellformed document. (e.g.

That's all folks! would not be allowed. However the spec 4.4(8) says that if the processor (i.e. the parser) is NOT validating the document, it doesn't have to expand the entity. I assume (contributions, please) that this would be done through a parser switch (-E expand entities, or similar). That means that your document could still parse (WF) even if the entity was not WF as long as expansion was disabled. > > Okay,they say that with XML-SGML a document can be built with document-part- > included using entities facilities. > HTML doesn't make use of external entities but it can do inline image through > some tag... In XML specs i doesn't see any reference to TAG or special attri- > butes that can handle inclusion of document component (text,image,object). This will be done through XML-LINK. This is much more powerful than HTML as it can be applied to any element. Here's how HTML's IMG would look in XML This defines IMG to be a SIMPLE XML-LINK. (Its target 'resource' is located through HREF just as in HTML's A. behaves with ACTUATE="USER" SHOW="REPLACE", i.e. nothing happens till the user clicks it, and then (usually) the display is replaced by the new 'resource'. For IMG the link is traversed immediately it is encountered, and the resource is embedded in the document (probably near the element). > > I would like to know : > - if in the future, we 'll only use external entities to include a > document component ? No, you can use XML-LINK to refer to part of the current document, as well as to external documents. If the external documents are XML then it is often straightforward to include them, but only if they have the same DOCTYPE If they have different DOCTYPEs we have a namespace problem and we are still wrestling with that one (e.g. The rate of this reaction is given by equation 1 where eqn1.xml might be written in MathML. ) If the external entity is BINARY (i.e. not XML - it may stiil be ASCII) then a NOTATION is required (e.g. for GIF). I'll stop there and suggest someone else tells us how to use NOTATION because I haven't implemented it yet!! > - anyelse, does XML will support special attributes for Tag to specify > that this Tag with this attributes can include something? You can add XML-LINK attributes to ANY element, so you don't have to use a single one like > - or does this feature will be hardcoded in a browser, making the same > mistake than HTML? Nothing is hardcoded in JUMBO, which is the first XML browser that I know of :-). If a browser manufacturer wishes to limit their browser to one particular XML application then good luck to them - maybe their market is well-defined. For example, if someone writes an XML browser specifically for mobile phones, they may well hardcode their application. I am strongly urging the scientific/technical/medical community to develop interoperable components and with CML and MathML we are off to a good start. A generic browser (like JUMBO) has to be prepared to implement XML-LINK and XML-STYLE independently of the DTD. It also has to be able to switch DOCTYPES for different namespaces. In principle it also has to be able to find tools to deal with a number of common NOTATIONS (GIF, CGM, etc.) and I hope that people will produce self-installing tools for those to save the browser m'facturers having to reinvent it every time. For the major horizontal browser m'facturers, we shall have to wait and see. I'm very much hoping there is a good API into XML browsers so that developers can avoid having to render HTML, interface with mail, etc. Let's have your postings...but keep them targeted to the development of XML tools, resources, documents, tutorials, etc. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Thu May 22 14:01:35 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:57:51 2004 Subject: XML & Entities inclusion against Inline Tag facilities. Message-ID: <3.0.32.19970522065649.00c4f168@swbell.net> At 09:52 AM 5/22/97 GMT, Peter Murray-Rust wrote: >In message <199705220709.JAA28224@ifhamy.insa-lyon.fr> Alexandre Mutel writes: >No, you can use XML-LINK to refer to part of the current document, as well >as to external documents. If the external documents are XML then it is >often straightforward to include them, but only if they have the same DOCTYPE >If they have different DOCTYPEs we have a namespace problem and we are still >wrestling with that one (e.g. > > >The rate of this reaction is given by >equation 1 > >where eqn1.xml might be written in MathML. >) There is *NOT* a name space problem in this case. The document "eqnl.xml" is *parsed* outside the scope of the document that references (it is semantically and functionally identical to a SUBDOC reference in normal SGML). Once the document is parsed, the result of that parsing is combined, by application-specific means, with the document tree of the referencing document. At that point, things like content model constraints are irrelevant and there are *NO* name space problems. In a typical implementation, the parsed result of the A element would include a *pointer* to the parsed result ("grove") of the eqnl.xml document, rather than literally including that document tree as a direct child of the CML element or the A element (depending on how you decided to represent the reference). Because each document is in its own grove, the name spaces for the documents are kept separate and there is no conflict. Applications are free to follow the reference from one grove to another and behave as if the second document was literally included at the point of reference. IT IS VITALLY IMPORTANT to remember the distinction between external text entities referenced by inline entity reference, which are fragments of the document string and are always parsed as part of it (when parsed at all), and references to document entities using addressing from attributes (either by URL or by attributes with a value prescription of ENTITY or ENTITIES). In the latter case, the referenced document is NOT parsed as part of the referencing document. Thus, there is a clear semantic difference between the use-by-reference of text entity references and the use-by-value of document entity references. [Do I have these two confused? It's early in the morning and I'm still suffering jet lag. By use-by-value, I mean you get the thing's value, not the thing itself.] The HyTime standard formalizes this notion of use-by-value through the "value reference" facility, which simply makes explicit the semantic intended by the A element in the above (that the effective value of the A element is really the document it refers to). But it is make very clear that a value reference is a *semantic* distinction--it doesn't change the way the source data is parsed. One confusion factor here is that, unlike SGML today (but not in the near future), if an XML file has no DOCTYPE declaration it can be used as either an external text entity (parsed in the context of its reference) or as a document entity (parsed in isolation), and you can't tell by looking at the entity which it was intended to be. In a very real sense, XML is saying that all external entities are either subdocuments or documents, even though XML doesn't include the formal notion of subdocument as in SGML. >If the external entity is BINARY (i.e. not XML - it may stiil be ASCII) then >a NOTATION is required (e.g. for GIF). > >I'll stop there and suggest someone else tells us how to use NOTATION >because I haven't implemented it yet!! Notations serve two primary purposes: 1. To clearly document the data type of an entity 2. To enable the association of processors with data types. The external identifier of a notation is intended to refer to the documentation for the notation (e.g., the CGM standard, the GIF spec, etc.). It may also be used to associate the notation with a notation processor. In a general SGML or XML processing system, you would expect to find a facility for mapping notations (by name or external ID) to processors or entries in function libraries, e.g., through some form of mapping catalog. An obvious implementation technique on Windows would be use OLE facilities to integrate the processors for data entities with the base browser. Part of the notation mapping would be the information needed to configure the OLE communication. I think at least one SGML editor is implemented in this way. Notations are somewhat redundant with MIME types, in that you may be able to determine the data type of an entity by examining the entity or applying whatever entrail reading gives you the MIME type. However, notations have the advantage that they're part of the document. One way to use notations, of course, is to map them to MIME types, e.g.: application/gif" > Or whatever the MIME type for GIF is. If this mapping is done in a catalog (rather than in the document), the same notation can be mapped to different things on different systems (MIME types are not universal). Notations must be used for data ("binary") entities. They can also be associated with elements by using attributes with a value prescription of "NOTATION". The notation named by the attribute then governs the interpretation of the element and its content (after parsing, of course). For example, you might do something like this: ]> Depending on the notation, you might provide different formatting of the source or even automatically extract the content and test it or compile it or something. In full SGML, notations can have attributes defined for them, which can be specified as part of the entity declarations. Notation attributes are intended to act as parameters to the processor of the notation. A typical example is attributes that describe the nature of a graphic, e.g.: Notations and notation attributes are also used for declaring the use of architectures and configuring their use within a document. This makes sense because a document type or architecture is defining the rules for a particular data type, namely documents that conform to the document type or architecture, therefore, it is part of the formal definition of a notation. For example, to derive a document from an architecture, you would do something like this (in this example, the archtecture is one I made up for representing bibliography entries): ... Personally, I think it is a serious mistake for XML to not have notation attributes, in large part because of their use with architectures, which are of critical importance to the use of XML. Notations are also used in XML (and in the future, with SGML), to create "formal processing instructions", where the notation name is the first keyword of the processing instruction. e.g.: This mechanism allows general processors to associate processing instructions with processors (using the notation-to-processor mapping it must already provide for entities and elements). It also enables better error reporting, because the processor can say 'Cannot find processor or definition for processing instruction notation "MyBrowser", public ID "my cool XML browser"', rather than either silently ignoring the PI or issuing an "Unknown PI ..." message. Cheers, Eliot xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 22 17:00:46 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:51 2004 Subject: XML & Entities inclusion against Inline Tag facilities. Message-ID: <7081@ursus.demon.co.uk> In message <3.0.32.19970522065649.00c4f168@swbell.net> "W. Eliot Kimber" writes: > At 09:52 AM 5/22/97 GMT, Peter Murray-Rust wrote: [...] > > > >The rate of this reaction is given by > >equation 1 > > > >where eqn1.xml might be written in MathML. > >) > > There is *NOT* a name space problem in this case. The document "eqnl.xml" > is *parsed* outside the scope of the document that references (it is > semantically and functionally identical to a SUBDOC reference in normal > SGML). Once the document is parsed, the result of that parsing is > combined, by application-specific means, with the document tree of the > referencing document. At that point, things like content model constraints > are irrelevant and there are *NO* name space problems. Thanks for clarifying this. Please treat me as the archetypal newcomer who means well. Understood. This is in fact what I do, but I was slightly misled in the draft by the phrase under 'EMBED': the 'designated resource should be embedded for the purposes of display or processing in the body of the resource and at the location where the traversal started'. I (mis)read that to mean that the spec required the remote resource to be emebedded and then processed (i.e. parsed). I also share your concern with the likelihood of linking to a document without a DOCTYPE which may have tags in common and where there is a possibility of confusion. Since you point out that 'embedding' is really a pointer, then the application can keep the namespaces separate, though it could be easy to make mistakes. [...] > One confusion factor here is that, unlike SGML today (but not in the near > future), if an XML file has no DOCTYPE declaration it can be used as either > an external text entity (parsed in the context of its reference) or as a > document entity (parsed in isolation), and you can't tell by looking at the > entity which it was intended to be. In a very real sense, XML is saying > that all external entities are either subdocuments or documents, even > though XML doesn't include the formal notion of subdocument as in SGML. Exactly. And it is possible to see cases where a given file is used in both ways (a) included through entities and (b) pointed to by LINK. [... thanks for the explanation of notation ...] I had not appreciated the use of the NOTATION to flag PI-types and will adopt this. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From bosak at atlantic-83.Eng.Sun.COM Thu May 22 22:46:42 1997 From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:51 2004 Subject: [Fwd:] XS discussion begins Message-ID: <199705222044.NAA19450@boethius.eng.sun.com> [The following message was just posted to the w3c-sgml-wg list. Please note that the draft document referred to in the message is intended for DSSSL implementors already familiar with ISO/IEC 10179, but readers of this list may find the flow object taxonomy and tables of characteristics interesting. All follow-ups should be made to the DSSSList.] In the SGML ERB meeting of May 14, it was agreed that preliminary discussion of xml-style (Part 3 of the XML specification suite) should take place in parallel with our current task of finishing drafts of xml-lang and xml-link, but in a different forum in order to prevent that discussion from interfering with our deadlines for Parts 1 and 2. Since xml-style has always been defined as based on a subset of DSSSL, it was agreed that the discussion could begin with a draft that puts the existing DSSSL Online (dsssl-o) specification in a form that can easily be made into a Working Draft for XML Part 3 when the time arrives for us to officially turn our attention to that part of the activity. I have now completed such a draft, which can be found at http://sunsite.unc.edu/pub/sun-info/standards/dsssl/xs/xs970522.ps.zip When unzipped, this file should print out with no trouble on most PostScript printers, and it displays well in the Solaris Image Tool. If you wish, you can use an RTF viewer by downloading http://sunsite.unc.edu/pub/sun-info/standards/dsssl/xs/xs970522.rtf.zip but in this case, you may get formatting somewhat different from what I intended. The RTF file was prepared by taking Jade output into Microsoft Word and manually inserting page breaks in certain places; no other hand work was performed. The PS version was generated directly from the massaged RTF using Word's LaserWriter II print driver. I hope to have an HTML version of the document available in the next week or so. In preparing this draft, I have incorporated a number of corrections to do960816.htm kindly provided by Tony Graham (although I cannot guarantee that all of his corrections were correctly performed) and done major surgery on the prose descriptions of the flow object classes, entirely eliminating the problematic language inherited from the DSSSL committee draft of September 1995 and starting over with language from the final committee draft, lightly edited for consistency with its new context. This should have fixed a lot of problems that were formerly caused by forking the version tree but has no doubt introduced some new ones. I am relying on reviewers to help with the work of correcting errors in this new version. In anticipation of its merger into the XML suite in a couple of months, the former dsssl-o application profile is now referred to throughout the document as "xml-style," or more frequently "XS" for short. Our unofficial motto (arrived at over drinks in Barcelona) is Nothing exceeds like XS. Please note, however, that despite its name and its look, this document is *not* in any way, shape, or form a W3C Working Draft. In fact, it is not a W3C document of any kind. It is just a revision of the dsssl-o application profile that has been circulating in one form or another for over a year and a half. Consequently, discussion of the draft should take place on the list devoted to DSSSL, which you can find out about from the http://www.mulberrytech.com/dsssl/dssslist page. Since this is not a W3C draft, and since the whole point of this exercise is to move forward on the groundwork for XML Part 3 without interfering with the work on XML Parts 1 and 2, it is essential that you DO NOT DISCUSS THIS DRAFT ON THE W3C-SGML-WG LIST. Please confine all discussions to DSSSList or to other appropriate public lists (though offhand I can't think of any others that would be appropriate). I will follow up on this message with another one to the DSSSList setting forth some suggested guidelines for the XS discussion. Jon ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems ---------------------------------------------------------------------- 2550 Garcia Ave., MPK17-101, | Best is he that inuents, Mountain View, California 94043 | the next he that followes Davenport Group::SGML Open::ANSI X3V1 | forth and eekes out a good ::ISO/IEC JTC1/SC18/WG8::W3C SGML ERB | inuention. ---------------------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Ingo.Macherius at tu-clausthal.de Fri May 23 19:02:27 1997 From: Ingo.Macherius at tu-clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:57:51 2004 Subject: New XML article Message-ID: <199705231702.TAA15827@majestix.rz.tu-clausthal.de> Ladies and Gentlemen, there is a new magazine article on XML avaliable. It appeared in the German iX-magazine (http://www.heise.de/ix/) on 14th of May. I consider it the first german language article aimed to the general public. iX-magazine generously did the english translation and put online versions on their server. http://www.heise.de/ix/artikel/E/1997/06/106/ (english) http://www.heise.de/ix/artikel/1997/06/106/ (german) I have to thank Jon Bosak, who agreed to use his 10_mail example, and Norbert Mikula for proofreading. All errors left are surely mine. Any comment, correction etc. is welcome. ++im -- Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld Mail : Ingo.Macherius@tu-clausthal.de WWW: http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri May 23 20:18:38 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:51 2004 Subject: New XML article Message-ID: <7166@ursus.demon.co.uk> In message <199705231702.TAA15827@majestix.rz.tu-clausthal.de> Ingo Macherius writes: > Ladies and Gentlemen, > > there is a new magazine article on XML avaliable. It appeared in the > German iX-magazine (http://www.heise.de/ix/) on 14th of May. I consider > it the first german language article aimed to the general public. > iX-magazine generously did the english translation and put online > versions on their server. Ingo, This is a first class article and I particularly like the diagrams. Could you ask IX magazine to make sure it stays on the WWW for as long as possible? I would like to be able to point people at it. (If so, is that the permanent URL?). If you ever think of drawing diagrams for XML-LINK that would be a great help to me :-). P. I hope this serves as a catalyst for other readers of this list - there is a LOT of work that needs to be done in providing introductions to XML, and in particular LINK will benefit from having clear expositions. More examples of all sorts are needed. (one small correction - JUMBO is not specific to chemistry. It will read any XML document and display it as a tree. Semantics can then be added at the element level, by adding (say) MATRIX.class - which could have an invert() method. In this sense it is somewhat complementary to stylesheets.) -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From housel at ms7.hinet.net Sat May 24 00:43:41 1997 From: housel at ms7.hinet.net (Peter S. Housel) Date: Mon Jun 7 16:57:51 2004 Subject: MathML DTD Message-ID: <199705232235.GAA25619@ms7.hinet.net> With the help of SP-1.1.4 (included with the latest snapshot of Jade) and the new -wxml option to nsgmlsu, I determined that I was being a bit lax with my mixed content declarations (too many levels of parentheses), and eventually managed to clean up my DTD to remove the problem. The new MathML DTD (which I just started trying to include into my own DTD) has the same problem. In fact it's a lot worse. Has anyone (with more DTD experience than I have) tried cleaning up the MathML preliminary DTD so that it follows XML's stricter rules about when #PCDATA can be declared? -Peter S. Housel- housel@ms7.hinet.net xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Ingo.Macherius at tu-clausthal.de Sat May 24 01:06:59 1997 From: Ingo.Macherius at tu-clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:57:51 2004 Subject: New XML article In-Reply-To: <7166@ursus.demon.co.uk> from "Peter Murray-Rust" at May 23, 97 07:08:40 pm Message-ID: <199705232306.BAA13916@talentix.rz.tu-clausthal.de> Peter Murray-Rust said: | > there is a new magazine article on XML avaliable. | | Could you ask IX magazine to make sure it stays on the WWW for as long as | possible? (If so, is that the permanent URL?). Yes, the URL is intended to be permanent. | If you ever think of drawing diagrams for XML-LINK that would be | a great help to me :-). All diagrams were taken from my master thesis. It deals with ways of industrial strength HTML publishing. My professor advised me not to publish it until I have a "quoteable publication" on the topic, which I think I have now. Sad but true it's German only. But maybe some of the diagrams are of use. They depict all of ISO/IEC 10179, SGML basics and Jigsaw. Second drawback is that they were done with a non-mainstream design program. But if anyone want to use any of them for non-commercial things, just ask. The (also permanent) URL for the PostScript (107 single sided DINA4 pages) version of the complete thesis is: http://www.tu-clausthal.de/~inim/thesis/thesis_im.zip There are about 30 pictures throughout the text. Please note that the thesis was finished before the advent of full XML, so some things are outdated. Others, as the DSSSL overview and the SGML introduction, may be still usefull. | (one small correction - Thanks to you and others who sent comments and errata via private mail. Obviously the translator didn't use the final German article as a basis, so extinct errors were re-introduced. Some are new in the english version (e.g. TEI is not HyTime). All known errors will be removed ASAP. Just a note: I went through the writer/editor/writer cycle 8 times for this article. The main problem was the XML terminology. Formally correct sentences are often not understandable by non-expert readers, while understandable versions are often formally incorrect. So the cycle often was me writing formally correct words, my editor "improving" them, me making the sentence correct again etc. XML/SGML language could turn out to be a major problem for XML. E.g. just try to explain the differences between markup/element/tag to a HTML person. In HTML it's all the same ... (at least in common understanding). ++im -- Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld Mail : Ingo.Macherius@tu-clausthal.de WWW: http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jenglish at crl.com Sat May 24 02:04:51 1997 From: jenglish at crl.com (Joe English) Date: Mon Jun 7 16:57:51 2004 Subject: MathML DTD In-Reply-To: <199705232235.GAA25619@ms7.hinet.net> References: <199705232235.GAA25619@ms7.hinet.net> Message-ID: <199705240001.AA08521@mail.crl.com> Peter S. Housel wrote: > The new MathML DTD (which I just started trying to include into my > own DTD) has the same problem. In fact it's a lot worse. Has anyone > (with more DTD experience than I have) tried cleaning up the MathML > preliminary DTD so that it follows XML's stricter rules about when > #PCDATA can be declared? I wouldn't bother just yet. The MathML DTD does not accurately describe the MathML language as specified in the rest of the TR. Since the prose of the working draft is normative (according to the warning at the top of appendix A), the DTD fragment is somewhat less than useful. --Joe English jenglish@crl.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From andrewl at microsoft.com Sat May 24 03:01:39 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:57:51 2004 Subject: Architectural Forms instead of Namespaces Message-ID: <7BB61B44F197D011892800805FD4F792A4BFFD@RED-03-MSG.dns.microsoft.com> Several writers have suggested that architectural forms could be used to solve the namespaces problem. Could someone who understands AFs rewrite the example below to use AFs? http://www.bigbookstore.com/schema bk http://www.w3.org w3 http://purl.org/dublincore dc http://www.shipping.com sh Number, the Language of Science Dantzig 5.95 9 1234567890 AndrewL@microsoft.com Thanks. --Andrew Layman AndrewL@microsoft.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From andrewl at microsoft.com Sat May 24 04:45:51 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:57:51 2004 Subject: CONCUR instead of Namespaces Message-ID: <7BB61B44F197D011892800805FD4F792A4C00A@RED-03-MSG.dns.microsoft.com> Following up on my earlier request re Architectural Forms, since several posts have suggested that CONCUR could be used to solve the namespaces problem, could someone who understands CONCUR rewrite the example below to use CONCUR? > > > http://www.bigbookstore.com/schema > bk > > > > http://www.w3.org > w3 > > > > > http://purl.org/dublincore > dc > > > > http://www.shipping.com > sh > > > > Number, the Language of > Science > Dantzig > 5.95 > 9 > > 1234567890 > AndrewL@microsoft.com > > > > > > Thanks. > > --Andrew Layman > AndrewL@microsoft.com > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jjc at jclark.com Sat May 24 07:58:29 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:57:52 2004 Subject: Architectural Forms instead of Namespaces Message-ID: <2.2.32.19970524054205.00ddc9fc@jclark.com> At 18:00 23/05/97 -0700, Andrew Layman wrote: >Several writers have suggested that architectural forms could be used to >solve the namespaces problem. Could someone who understands AFs rewrite >the example below to use AFs? If I was designing an architectural form mechanism that could work with just instances, I would probably do it something like: Number, the Language of Science Dantzig 5.95 9 1234567890 AndrewL@microsoft.com In fact I would always use a DTD subset to get something like this: ]> Number, the Language of Science Dantzig 5.95 9 1234567890 AndrewL@microsoft.com and I would also probably make use of the rules for defaulting the form attribute so I could instead do: ]> Number, the Language of Science Dantzig 5.95 9 1234567890 AndrewL@microsoft.com Finally I would probably put the DTD in a separate file: Number, the Language of Science Dantzig 5.95 9 1234567890 AndrewL@microsoft.com James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 24 08:26:48 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: New XML article Message-ID: <7198@ursus.demon.co.uk> In message <199705232306.BAA13916@talentix.rz.tu-clausthal.de> Ingo Macherius writes: Thanks, [...] > > Just a note: > I went through the writer/editor/writer cycle 8 times for this article. The > main problem was the XML terminology. Formally correct sentences are often > not understandable by non-expert readers, while understandable versions are > often formally incorrect. So the cycle often was me writing formally correct I think this is extremely important. I have been (trying) to interpret and implement the XML-LINK spec and getting some of it wrong :-). XML is difficult in places unless you are quite familiar with SGML - I think XML-LINK will be a major challenge to the drafters. (I'm sure they'll manage it :-). This is why I'm keen about diagrams. XML-link is described in words, and it's incredibly easy to read the wrong meaning into them. For example, the 'locators' in 'links' locate 'resources' and it's not easy to write programs until it's absolutely clear what each of these means. The reverse is also true - when something is described clearly and precisely it makes it enormously easier to write code. > words, my editor "improving" them, me making the sentence correct again etc. > XML/SGML language could turn out to be a major problem for XML. E.g. just > try to explain the differences between markup/element/tag to a HTML person. > In HTML it's all the same ... (at least in common understanding). I agree. There are several aspects. One to protect the newcomer from too much terminology to start with. Of course it must always be precise. Another is to relate abstract terminology to concrete examples where possible. I have developed an XML application to manage terminology and I'm just about to start collecting XML terminology for it (unless someone is already doing this. It's loosely based on ISO12620 and displays a (hierarchical) glossary as an XML tree. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon May 26 21:58:46 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: LT XML toolset, parser, developers (fwd) Message-ID: <7273@ursus.demon.co.uk> Posted by Henry Thompson and intended for xml-dev - apologies if it's a duplicate. P. Forwarded message follows: > From w3c-sgml-wg-request@w3.org Mon May 26 13:53:09 1997 > Received: from relay-5.mail.demon.net by ursus.demon.co.uk with SMTP > id AA7267 ; Mon, 26 May 97 13:53:07 BST > Received: from punt-1.mail.demon.net by mailstore for peter@ursus.demon.co.uk > id 864642046:05:00596:4; Mon, 26 May 97 11:20:46 BST > Received: from www19.w3.org ([18.29.0.19]) by punt-1.mail.demon.net > id aa0604148; 26 May 97 11:19 BST > Received: by www19.w3.org (8.8.5/8.6.12) id GAA07682; Mon, 26 May 1997 06:14:26 -0400 (EDT) > Resent-Date: Mon, 26 May 1997 06:14:26 -0400 (EDT) > Resent-Message-Id: <199705261014.GAA07682@www19.w3.org> > X-Authentication-Warning: www10.w3.org: Host stevenson144.cogsci.ed.ac.uk [129.215.144.1] claimed to be stevenson.cogsci.ed.ac.uk > Message-Id: <1296.199705261013@grogan.cogsci.ed.ac.uk> > From: "Henry S. Thompson" > Date: Mon, 26 May 97 11:13:42 BST > To: w3c-sgml-wg@w3.org, salt@uk.ac.ed.cstr, xml-dev@ic.ac.uk, > elsnet-list@uk.ac.ed.cogsci, corpora@no.uib.hd, > empiricists@EDU.Stanford.CSLI > Subject: LT XML toolset, parser, developers' API released > X-List-URL: http://www.w3.org/pub/WWW/Archives/Public/w3c-sgml-wg/ > X-See-Also: http://www.w3.org/pub/WWW/MarkUp/SGML/Activity > Resent-From: w3c-sgml-wg@w3.org > X-Mailing-List: archive/latest/4795 > X-Loop: w3c-sgml-wg@w3.org > Sender: w3c-sgml-wg-request@w3.org > Resent-Sender: w3c-sgml-wg-request@w3.org > Precedence: list > Status: R > > The Language Technology Group is pleased to announce the beta release > of LT XML, the first publicly available XML toolset written in C. > > For further information and access to the software distribution, see > > http://www.ltg.ed.ac.uk/software/xml/ > > The LT XML tool-kit includes stand-alone tools for a wide range of > processing of well-formed XML documents, including searching and > extracting, down-translation (e.g. report generation, formatting), > tokenising and sorting. > > LT XML is an integrated set of XML tools and a developers' tool-kit, > including a C-based API. The beta release now available is UNIX-only, > but a WIN16 version will be available in the near future. > > Sequences of tool applications can be pipelined together to achieve > complex results. > > For special purposes beyond what the pre-constructed tools can > achieve, extending their functionality and/or creating new tools is > easy using the LT XML API, which provides both event-oriented and > tree-fragment oriented access to the input document stream. Minimal > applications require less than one-half page of C code to express. > > LT XML is available to anyone free of charge for non-commercial purposes. > > ---------------------- > Henry S. Thompson, Human Communication Research Centre, University of Edinburgh > 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 > Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Tue May 27 15:56:29 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:52 2004 Subject: Commercial job postings? Message-ID: <3.0.32.19970527065333.00a1fe80@pop.intergate.bc.ca> Meta-question: I was talking to some people yesterday who want to do some big bold things with XML, and were wondering where a good place might be to look for people. I told them that XML was easy enough that they ought to hire people with application expertise and they'll pick up XML in no time, but they weren't convinced. Anyhow, would people consider it an abuse of this mailing list if the odd job posting started showing up? I personally think it would be good, simply as a service to the industry we're trying to create, and as a useful barometer of what's going on out there. - Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed May 28 08:15:48 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: Commercial job postings? Message-ID: <7301@ursus.demon.co.uk> In message <3.0.32.19970527065333.00a1fe80@pop.intergate.bc.ca> Tim Bray writes: > Meta-question: [...] > > Anyhow, would people consider it an abuse of this mailing list if > the odd job posting started showing up? I personally think it would > be good, simply as a service to the industry we're trying to create, > and as a useful barometer of what's going on out there. - Tim Personally I wouldn't have a problem, but it's Henry who does the hard work behind the scenes. Job listings would certainly give the impression would certainly give the impression XML is going places. [Just give me a private copy as well :-)]. While I'm posting, this is probably a trivial XML/SGML question, but I am worried about EMPTY content in WF documents. As far as I can see, in validated SGML ]> and ]> both return the same result, and that seems to be the same with NXP both validating and non-validating (the first example uses of course). Is there any way that the content could be returned as a null string ("")? Can attributes have null values? is B="" the same as omitting B when #IMPLIED? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lee at sq.com Wed May 28 16:17:29 1997 From: lee at sq.com (lee@sq.com) Date: Mon Jun 7 16:57:52 2004 Subject: Commercial job postings? Message-ID: <9705281416.AA11586@sqrex.sq.com> Peter Murray-Rust wrote: > As far as I can see, in > validated SGML > > > ]> > > > and > > > ]> > > > both return the same result, That depends on what you mean by "return". ISO 8879 doesn't say anything about anyone returning things at all. They are not distinguished in ESIS, but that's just because ESIS is broken in this regard. > Can attributes have null values? is > > B="" the same as omitting B when #IMPLIED? No, they are not the same -- an attribute value can be zero characters, as can an element's content. Lee xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From rrseibel at att.com Wed May 28 16:52:01 1997 From: rrseibel at att.com (Seibel, Robert R) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations Message-ID: XMLers: I'm a new member to this list so please excuse my ignorance of what has gone on in the past. I'm surveying the market for XML editors for my project. I know the market is in its infancy but does anyone know who is farthest along? The editor should: 1) let me add my own markup tags into a pull down menu 2) use a predefined template of tags (elements) to start the document off 3) let me format those tags using a style sheet 4) permit editing in a WYSIWYG mode according to the style sheet 5) be simple to use, hiding detail from the authors Does anyone have any recommendations??? Thanks. >Bob Seibel xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 29 01:12:42 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: Commercial job postings? Message-ID: <7327@ursus.demon.co.uk> In message <9705281416.AA11586@sqrex.sq.com> lee@sq.com writes: > Peter Murray-Rust wrote: > > As far as I can see, in > > validated SGML > > > > > > > ]> > > > > > > and > > > > > > > ]> > > > > > > both return the same result, > That depends on what you mean by "return". ISO 8879 doesn't say anything > about anyone returning things at all. They are not distinguished in ESIS, > but that's just because ESIS is broken in this regard. Understood. In that case the question might be 'for WF XML documents with no Element declaration, are: and identical? > > > Can attributes have null values? is > > > > B="" the same as omitting B when #IMPLIED? > > No, they are not the same -- an attribute value can be zero characters, > as can an element's content. ^^^^^^^^^ My implication from this is that in my second example, A has a child of unknown type of null content OR that A has a #PCDATA child which a content of "". In which case it could have an effect on (say) counting #PCDATA children. If so, it might need flagging in the draft... P. > > Lee > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 29 01:12:59 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations Message-ID: <7328@ursus.demon.co.uk> In message "Seibel, Robert R" writes: > XMLers: > > I'm a new member to this list so please excuse my ignorance > of what has gone on in the past. Don't worry! This is an important topic and one that hasn't been discussed. IMO editors are going to be key for certain aspects of XML. > > I'm surveying the market for XML editors for my project. I think that there are two extremes to the spectrum (A) the 'traditional' which is the one that I think you allude to - writing and editing text, sformatting, spellchecking, etc. and (B) the new opportunities, so bringing in a graphics, adding an image map, adding some maths, creating a link database, importing and converting legacy files on the fly. (B) is where I am aiming JUMBO at - at present it will edit the structure tree, import new legacy data and convert on the fly but it doesn't edit text. It will also be aimed at using NXP to validate vs the DTD. > I know the market is in its infancy but does anyone know who > is farthest along? The editor should: > > 1) let me add my own markup tags into a pull down menu > 2) use a predefined template of tags (elements) to start the > document off > 3) let me format those tags using a style sheet > 4) permit editing in a WYSIWYG mode according to the > style sheet > 5) be simple to use, hiding detail from the authors Can't help in detail, but there were several promising prototypes at SGML97, Stilo, Balise, Frame, etc. Maybe these vendors would like to say something? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From michael at textscience.com Thu May 29 05:53:10 1997 From: michael at textscience.com (Michael Leventhal) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations In-Reply-To: <7328@ursus.demon.co.uk> from "Peter Murray-Rust" at May 28, 97 11:05:35 pm Message-ID: <199705290352.UAA08842@shell1.aimnet.com> Peter Murray-Rust wrote: > > I'm surveying the market for XML editors for my project. > > I think that there are two extremes to the spectrum (A) the 'traditional' > which is the one that I think you allude to - writing and editing text, > sformatting, spellchecking, etc. and (B) the new opportunities, so > bringing in a graphics, adding an image map, adding some maths, creating a > link database, importing and converting legacy files on the fly. (B) is > where I am aiming JUMBO at - at present it will edit the structure tree, > import new legacy data and convert on the fly but it doesn't edit text. > It will also be aimed at using NXP to validate vs the DTD. > > Can't help in detail, but there were several promising prototypes at SGML97, > Stilo, Balise, Frame, etc. Maybe these vendors would like to say something? I'd like to, but I am very concerned about misusing this list for commercial purposes, despite the invitation. I think I can mention that Grif did demo _two_ XML editors at SGML '97 Europe and WWW6. I also think I can pursue Peter's point about there being two types of editors, A and B above, from a technological/philosphical/cultural perspective. Grif also has an A and B which are not exactly what Peter describes but sort of close. The origin was not intended to delineate a philosophical distinction although the currents of history may have in fact made it so. Grif's XML editor A is a knock-off from its traditional SGML with "WYSIWYG to the max" product. It requires a DTD, enforces structure, and controls the presentation through a high-end style sheet mechanism. XML editor B is a knock-off of Grif's HTML editor, Symposia, and does not enforce structure, allows you to add tags at will, is CSS-based and does the usual HTML-related stuff like allow you to create (XML) links and image maps, add math, etc. I initially found the idea of having two XML editors to be possibly schizophrenic so I am intrigued by Peter being already in possession of a two editor world-view, essentially the SGML and the HTML approaches, DTD-required vs well-formed. I guess I always assumed that you'd combine the two, change modes at the flick of a switch, but somehow encourage more rather than less structure by always having the capability of showing the user his or her structural failings. Of course, the code bases have, by now, divurged greatly though companies like Grif certainly leveraged their SGML experience in entering the HTML fray. But I thought the perspectives were coalescing. Is this two editor approach a transitional stage on the way to a more glorious evolutionary stage or have we, in fact, distinguished different types of tasks to which different types of tools have been precisely tailored to exact nature of the task? Michael Leventhal ================================================================== Michael Leventhal 1800 Lake Shore Ave, Ste 14 V (510) 444-2962 VP Technology Oakland, CA 94606 F (510) 444-1672 GRIF, SA michael@textscience.com http://www.grif.fr xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu May 29 10:45:44 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations Message-ID: <7353@ursus.demon.co.uk> In message <199705290352.UAA08842@shell1.aimnet.com> Michael Leventhal writes: > Peter Murray-Rust wrote: [...] > > I'd like to, but I am very concerned about misusing this list for commercial > purposes, despite the invitation. I think I can mention that Grif did demo > _two_ XML editors at SGML '97 Europe and WWW6. I appreciate this restraint, thanks - but would like to suggest that we can relax it a bit *at this stage in XML development*. My reasoning is as follows. [BTW there is absolutely no *pressure* for any m'facturer to say anything here in advance of public release - and no inferences should be drawn from any apparent silences. So if there is no response - for good commercial reasons - fine. I take it as axiomatic that all major current SGML m'facturers are *interested* in XML, so silence carries little information :-)]. Implementations tend to define de facto procedures. For example when C++ came out it was an almighty mess. There were several different compilers, all from different manufacturers and working to different levels and by different mechanisms. Some used a preprocessor, some were native, some had templates and all on a varying timescale. You very soon got not only m'facturer lockin, but version lockin :-( The XML-spec is not yet frozen, but people are (rightly IMO) creating tools in advance of the final spec. Let's say those tools suddenly emerged on July 2 (spec is announced July 1. right?) and they take fundamentally different approaches to the language, that *may* have some bearing on language revisions. We are concerned that XML does not have multiple conformance levels, and a comparison of editor/parser features may help to approach that problem. Many *document* developers may be wishing to create trial XML documents or prototype legacy conversion. It would be reasonable for them to ask where they could find a (prototype) editor to start with. They might then discover that there were significant problems/advantages in XML. [Some of these problems may also be dealt with if people compile XML resource pages.] > > I also think I can pursue Peter's point about there being two types > of editors, A and B above, from a technological/philosphical/cultural > perspective. Grif also has an A and B which are not exactly what Peter > describes but sort of close. The origin was not intended to delineate a > philosophical distinction although the currents of history may have in fact > made it so. My motivation here is that I see editing as one of the key steps to getting XML universally accepted. Yes, the current text-oriented SGML tools will be modified/rewritten to give XML editors, but they won't address the applications that no-one has thought of. What does a CML editor want? > > Grif's XML editor A is a knock-off from its traditional SGML with > "WYSIWYG to the max" product. It requires a DTD, enforces structure, > and controls the presentation through a high-end style sheet mechanism. > XML editor B is a knock-off of Grif's HTML editor, Symposia, and does > not enforce structure, allows you to add tags at will, is CSS-based > and does the usual HTML-related stuff like allow you to create > (XML) links and image maps, add math, etc. This is very exciting news. I would be interested to know more. > > I initially found the idea of having two XML editors to be possibly > schizophrenic so I am intrigued by Peter being already in possession of > a two editor world-view, essentially the SGML and the HTML > approaches, DTD-required vs well-formed. I guess I always assumed > that you'd combine the two, change modes at the flick of a switch, > but somehow encourage more rather than less structure by always > having the capability of showing the user his or her structural > failings. Of course, the code bases have, by now, divurged greatly > though companies like Grif certainly leveraged their SGML experience > in entering the HTML fray. But I thought the perspectives were coalescing. > Is this two editor approach a transitional stage on the way to a more > glorious evolutionary stage or have we, in fact, distinguished different > types of tasks to which different types of tools have been precisely tailored > to exact nature of the task? What I want for an editor for the chemical community is, I think, generalisable to may other applications. (a) no discipline-specific tools, but good hooks to link them in (b) full support for XML-LINK (c) tree-based editing (d) attribute editing , controlled by DTD (e) import of legacy data and conversion on the fly by user-written add-ons (f) support for whatever solutions XML comes up with for XML-TYPE, XML-LINK XML-STYLE, XML-MONEY,... (g) WYSIWYG HTML editing with XML-LINKing to imported subdocuments. (h) Cunning chemical editing that I think of and develop. *I* can do the *chemical* bit. I'd prefer to do it once and not one for A's tool, one for B's tool, etc. My current preference for several reasons would be Java beans - e.g. there will be a HTML bean, Word bean, Molecule bean, etc. I have always felt that posters to comp.text.sgml have been very responsible in the use of commercial postings. I think that a listing of current capabilities of editors would be valuable to readers of this list. However if people don't general share this view, please post - either to the list or me personally - and I will then suggest revised etiquette. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Fri May 30 03:43:14 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations References: <199705290352.UAA08842@shell1.aimnet.com> Message-ID: <338E3096.32F3@hiwaay.net> Michael Leventhal wrote: > > I thought the perspectives were coalescing. > Is this two editor approach a transitional stage on the way to a more > glorious evolutionary stage or have we, in fact, distinguished different > types of tasks to which different types of tools have been precisely tailored > to exact nature of the task? > > Michael Leventhal Possible. Even in the past, we have seen considerable differences between SGML-complete editors that were very powerful and came with attendant setup complexity, and editors that just let you point to a DTD and get a configured editing interface. Along the way, some systems whose design parameters did not include the complexities of *faithful to the pica* print requirements have been used successfully. At least two of these were based on laissez-faire (well-formed input/batch validation on request) systems. These fared well in production environments and are still deployed. Here is another perspective. What if DTDs came into being as a result of measurement of frequency and occurrence rather than from design and imposition? Note I am not talking about DTDs generated by inducing markup, but DTDs created as tags are generated by users in the course of natural tagging. Consider the habits borne of the HTML users who began to unwittingly use content tagging styles almost as jokes to delineate thoughts in emails, etc. It is interesting to speculate what the place of genetic DTDs such as could be created from these would have since in some ways the resemble a natural language emerging from an artificial language environment. Len Bullard Intergraph Corporation xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From fahrner at pobox.com Fri May 30 07:08:37 1997 From: fahrner at pobox.com (Todd Fahrner) Date: Mon Jun 7 16:57:52 2004 Subject: Comercial XML editor recommendations In-Reply-To: <338E3096.32F3@hiwaay.net> References: <199705290352.UAA08842@shell1.aimnet.com> Message-ID: At 20:42 -0500 5.29.97, len bullard wrote: > Here is another perspective. What if DTDs came into > being as a result of measurement of frequency and > occurrence rather than from design and imposition? > Note I am not talking about DTDs generated by inducing > markup, but DTDs created as tags are generated by > users in the course of natural tagging. Consider > the habits borne of the HTML users who began to > unwittingly use content tagging styles almost as > jokes to delineate thoughts in emails, etc. > It is interesting to speculate what the place of > genetic DTDs such as could be created from these > would have since in some ways the resemble a > natural language emerging from an artificial language > environment. Fascinating. A year from now, you could set a spider to identify patterns of class markup on HTML elements on the Web, and upon semantic analysis either fold these as elements into an XML version of HTML, or create new DTDs based on aggregations within narrow subject areas. Apple's proposal for an extensible metadata format, Meta Content Format (MCF), expressly passes up SGML with the following explanation: The main reason for introducing yet another file format is so that we have an interchange format that is not beholden to legacy applications that can['t] track the changes in the expressiveness of MCF. [1] If I'm reading correctly, this reasoning may have been valid before XML - and possibly to some extent even now - but if XML tools develop as a kind of nursery for "genetic" DTDs, then it will have been rebutted. [1] http://mcf.research.apple.com/mcf.html ________________________________________ Todd Fahrner mailto:fahrner@pobox.com http://www.verso.com/ The printed page transcends space and time. The printed page, the infinitude of books, must be transcended. THE ELECTRO-LIBRARY. --El Lissitzky, 1923 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat May 31 01:12:08 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:52 2004 Subject: XML-LINK Message-ID: <7391@ursus.demon.co.uk> I am trying to understand how XML-LINK might be used and would be grateful for some gentle hints. The motivation is to develop a set of routines in JUMBO that are generic and will support a reasonable variety of ways in which links might be used. I am confident that there are readers of this list who have clear ideas of how links might be used and I hope they can spend a few minutes to give some *very* simple guidance. As we are all aware the XML-LINK spec is in early draft and is scheduled for revision before July 1. It is also widely agreed that some of the terminolgy needs tightening and that some details of the syntax and the semantics need addressing. So only a general approach is required. I hope it will be seen as helpful if I put forward my current understanding of what XML-LINK is intended to do, and XML-DEVers can annotate my ramblings. [They can use XML-LINK do to it :-), accepting that we have no means of addressing into my content.] So here goes... A link has ends which are called resources. My current understanding is that these can be thought of as points in the structure of a document, and will often coincide with Elements. I am as yet unclear about the total number of possible topolgies of a link, and ask some questions here. Structure and Behaviour. My understanding is that a hyperdocument can have a link structure which is independent of behaviour - it simply represents the structure of the information. I'm happy with this - what I'm less clear about is whether there are *commonly agreed semantics* for this, or whether it's all application-dependent. [If the answer to all my concerns is 'application- dependent' then it will be a pity because everyone will write individual link processors and there will be no reusability.] I'm aware that all these concerns are catered for by HyTime, but since I am ignorant of HyTime, answers which refer to that won't be much use to me - ideally they should be in the context of the current spec. Thus I assume we can transmit structures like DAGs, linked lists, relational tables, etc. by the use of XML-LINK without being concerned how they are going to behave. At this stage I'd like simply to address structure. SIMPLE The simplest link is XML-LINK="SIMPLE" and is an analogue of HTML's or . My view of it is exemplified by this fictitious XML document:

This is resource A which points to the foo bird (see picture )

Here there are two links, both being unidirectional. I understand the the ends of the first link are the 'point' described by 'ID=A', and the point described by ID=foo (though this is still being discussed). If this is true, then in a **tree-based** tool like JUMBO the ends of the link correspond to nodes in the tree (labelled by ID=A and ID=foo). The second link is harder because the resource in foo.gif is not clear (perhaps it is the inode in the UNIX system?). I have (I believe) implemented SIMPLE links in JUMBO. Each Node has a method isLink() which says whether it's the start of a SIMPLE link. (I may have to change this nomenclature when the other links become clearer.). So, for example, when process()ing a Node, JUMBO looks to see if it isLink() and if so what does it point at (value of HREF). It seems to work. Note that in this model, the resource which is pointed to (ID=foo, or foo.gif) is not required by XML-LINK to know anything about the link. I asumme it could be argued both ways that the pointedAt should/should_not know what is pointing at it. [SHOW and ACTUATE are deliberatly not discussed, although I think they are straightforward (at least compared to EXTENDED).] EXTENDED EXTENDED is a container for an indefinite number of LOCATOR links. [LOCATOR has exactly the same syntax as SIMPLE but has presumably different semanttics.] EXTENDED does not by itself define a resource and is normally remote from the resources. I can see how a bi-directional link might be constructed from EXTENDED [It's other multiplicities I don't feel so happy with.] Does this example capture it?

Friends, Romans, Countrymen, lend me your ears

. ... ... We therefore have a bidirectional link between the verb and the noun, so that each of them can locate the other. Therefore, in JUMBO, there has to be a pointer which is available to each Node. My temptation would be for each node to carry a hashtable of links to other nodes so that (say) when W1 was asked what it linked to it would come up with a list of the Nodes at the other end of its links. W2 would be such a node. On the other hand it might point to the LINK (i.e. link1, and it might be clear from the 'contents' of link1, what the other end was. Is this too restricted? I am not clear how this extends to 'multidirectional links' Here is a typical problem. to bear the slings and arrows of ... ... Here I want to indicate that the verb 'bear' links to two nouns at the same time and that each noun points to 'bear'. But it isn't obvious that this is the case (unless perhaps ROLE is used for that, and that doesn't seem general). The topology can be seen as a multidirectional link, with a single 'end' and a double 'end' (W3<-->(W4,W5)). Alternatively it can be seen as two bidirectional links grouped together )(W3<-->W4),(W3<-->W5)). In either case I don't think I have captured this sufficiently well that it is capable of being automatically or semi-automatically processed. Guidance would be gratefully received, particularly if it makes it clear whether there is a generic way of supporting this in code. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Sat May 31 03:16:57 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:52 2004 Subject: XML Spec Questions Message-ID: <011290D45A8ACF119B8B00805FD471D6032C449D@RED-24-MSG.dns.microsoft.com> In section 4.5 of the latest XML spec it says that if the predefined entities are declared they must be declared as follows: "> Is the definition for amp valid XML? The definition of EntityValue is given is section 1.5 as: EntityValue := '"' ([^%&] | PEReference | Reference)* '"' | "'" ([^%&] | PEReference | Reference)* "'" This indicates that "&" is not a valid entity value? xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Sat May 31 05:31:57 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:52 2004 Subject: XML Spec Questions Message-ID: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca> At 06:16 PM 5/30/97 -0700, David Schach wrote: >Is the definition for amp valid XML? The definition of EntityValue is >given is section 1.5 as: No. It's a bug in the spec. On the list to fix. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jduggan at magma.ca Sat May 31 06:39:19 1997 From: jduggan at magma.ca (Josh Duggan) Date: Mon Jun 7 16:57:53 2004 Subject: XML Spec Questions In-Reply-To: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca> Message-ID: <3.0.1.32.19970531003840.00692dbc@mail.magma.ca> Hi All, As long as we're pondering the spec; why does ElementDecl's Seq have ", " as a seperating String. Is this a typo in the spec or do we need to inforce a space after ','? Best regards. Josh Duggan | Gralen Digitext Inc. jduggan@magma.ca | josh@gralen.com www.magma.ca/~jduggan | www.gralen.com "Work damn you or I'll beat you with your own toner cartridge!" - High Commander xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From phj at teleport.com Sat May 31 08:22:21 1997 From: phj at teleport.com (P. Ju) Date: Mon Jun 7 16:57:53 2004 Subject: XML Spec - timeline? In-Reply-To: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca> Message-ID: Hi all. I am new to this list and am starting work on an XML book. I've checked out the XML pages at W3C but have so far been unable to find a definitive schedule for the first release (not draft) of the XML spec, the XML-link spec, and the stylesheet standards. Can you help with this? Thank you. Patricia Ju phj@teleport.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk)