From cbullard at hiwaay.net Sun Feb 1 00:34:02 1998
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 17:00:03 2004
Subject: XSL/XML/XLL and VRML (was: Re: Conditional actions in XSL?)
References: <4955E202FE46D11195C500609712EB6B05C193@FLPS-NTSERVER1>
Message-ID: <34D3C277.5002@hiwaay.net>
Tony Stewart wrote:
>
> Len Bullard wrote:
>
> >>It can do what DTDs do well: provide a precise description of the
> presentation style of the interface as a set of routed behaviors.
>
> I would have thought that a good DTD doesn't do this at all. The DTD
> should define the information content, leaving both style and (IMO)
> behavior to be specified in a stylesheet that is tailored to this
> specific usage of the information.
> Thus, it is the style sheet describes
> the presentation style, not the DTD. Otherwise, how are you going to
> reuse the information in other formats? You're not going to want to
> change the DTD. And you may not have permission to do so in any case.
>
> Since this is all pretty basic religious thinking, perhaps I
> misunderstood you.
One could say that it is a religious conviction in some cases and
be quite right, and in others, it is an engineering constraint and
be right. It is the *SGML Way*. In that sense, yes, it is a religion,
and for some years, I practiced it. "But what is the good, Phaedrus?"
Look at what you are saying:
1. Stylesheet properties are not "information"
2. Stylesheets express behaviors. So in fact, a stylesheet
language is a programming language, Turing complete if you will.
3. For some kinds and instances of information, there are lifecycle
requirements for reuse.
4. For some kinds and instances of information (DTDs in your example),
there are policies for the behaviors that can be applied to the
kinds and instances of information.
1. I don't think you intend one. But it is often a hidden premise in
the debates about separating style from content (which is what you are
using information). That distinction proves to be thin. Perhaps
by stylesheet information, you mean, typographic properties.
2. Stylesheets that express behaviors are simply programming languages
with structures (data types) for typographic properties. In this
view, Java/AWT et al is a stylesheet language. After that, choosing one
comes down to practical engineering requirements of platforms,
libraries,
interoperation with other engines, etc. Anyway, in this view, VRML
is a stylesheet language. Perhaps the best way for it to include
text support is to include it natively. This idea has come up and
there is a text node in VRML which browsers like WorldView can display
very well. (NOTE: The issue of reformulating VRML as XML is one
of the framework efficiency, not descriptive power or lifecycle.)
3. This is true of course. But unless requirements are very carefully
examined, no size fits all.
4. True and it varies widely. One of the features of DTDs that make
them
very attractive for policy is the ease with which they can
be adjusted liberally on site of use. This one slips by most of the
SGML theorists who do not work in production sites where multiple
versions of
DTDs are used at different points of a process or procedure. In other
words,
they are an instrument of policy, not a policy. Information is not
static
where a high rate of change prevails. A DTD is more like a control knot
in a
NURB than a point in a B-spline.
My point is that for many information engineering problems, the approach
Pierre took with Prototype has been taken by others and successfully.
The arbiter of success is not the religion of the SGML Way, but the
ability to meet the requirements of the task. Bytes aren't holy.
As XSL/XML/XLL reach ever greater levels of design complexity in the
base standards, a question emerging in other design groups (one heard
before during the HyTime/DSSSL era) is: Are these really complicated
solutions looking for problems, not new and vital technologies? Is
there sudden rush of popularity based on the soundness of applicability,
or the product of software company juggling of public perceptions?
If simpler and more readily available and more easily understood
technologies exist to solve a problem with an acceptable timeframe
exist, the experienced engineer and the practical customer adopt
them. If not, they try the next best thing. Is XML a *religion*
of just the next best thing?
Len Bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Sun Feb 1 00:47:54 1998
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 17:00:03 2004
Subject: SGML Architecture questions
Message-ID: <2.2.32.19980201004632.00719004@dream.paragraph.com>
I may be wrong, but from my understanding of SGML architecture, only
bridging mechanism provides for type extension. Everything else in
architecture seems to be element and attribute names remaping. Bridging
element serves as a target for mapping substructure to it. Still bridging
element is not defined in DTD and as a result its content/attributes can't
be validated by parser. Is that correct ?
Taking bridging example from "A Tutorial Introduction to SGML Architectures"
by W. Eliot Kimber, with architectural DTD :
And mapping from elements in the document to elements in the architecture :
]>
KimberWilliam
1234 Maple St.
AustinTX78757
There is no DTD for element content so:
KimberWilliam
could be :
KimberEliotWilliam
So my question is :
how validity constraints can be enforced for bridging element substructure ?
Thanks,
Dima
-----------------
Dmitri Kondratiev
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Sun Feb 1 01:01:24 1998
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 17:00:03 2004
Subject: First experiences with XSL
References: <2.2.32.19980130155416.0085e27c@pop>
Message-ID: <34D3C820.2671@hiwaay.net>
Sharon Adler wrote:
>
> Michael,
>
> As I write this, the XSL WG is 2/3 through its first official meeting. The
> Microsoft code does not represent the "Final" XSL but the srawman of some of
> the facilities of XSL. The lack of diagnostics/limited functionality of a
> partial prototype implementation is not any indication of the functionality
> or capability of a style language, nor any final implementation. Of course
> you can accomplish what you wanted in Java. Any hacker can do anything they
> want in code, but what about the rest of the world's humans.
Can anyone show that XSL (if indeed, a Turing complete language) is any
easier
than Java? XSL is a programmig language and there are far more mortals
(programmers in some cases) who understand and can easily use Java than
XSL/DSSSL. Why? Object-oriented programming is the rule
not the exception in programming communities. JavaScript has a
tremendous
advantage in that stepping up to Java from JavaScript incurs no
shocks of syntax. It is an easy transition.
Since at least C forward, it has been the support libraries
that made the difference in ease or utility because syntax aside,
and side effect issues, the same features are found in most programming
languages. So, one might retreat to the defense of "But it is a
standard"
and there one would have a point. Unless and until Sun releases Java
as a true standard (a PAS won't cut it), implementors of systems
based on it create systems based on proprietary technology.
> Please don't use the XSL prototype if it is not suitable for you to play
> around with, but give us a chance to create a workable standard.
But of course.
len bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jamsden at us.ibm.com Sun Feb 1 01:07:30 1998
From: jamsden at us.ibm.com (Jim Amsden)
Date: Mon Jun 7 17:00:03 2004
Subject: XSL/XML/XLL and VRML (was: Re: Conditional actions in XS
Message-ID: <5040100014394115000002L052*@MHS>
Tony Stewart wrote:
>>I would have thought that a good DTD doesn't do this at all. The DTD
>>should define the information content, leaving both style and (IMO)
>>behavior to be specified in a stylesheet that is tailored to this
>>specific usage of the information.
More religion:
Information content should be subordinate to behavior, not the other way
around. The DTD defines the information structure required to support
(unfortunately) implied behavior which establishes the meaning of that data in
the context in which it was defined. Attributes establish characteristics which
maintain state supporting variant behavior. Contents and links represent
associations supporting additional state, and enabling collaborations with
other elements required to support behavior, including behavior of the document
as a whole. Of course, none of this has anything to do with rendering unless
that's the subject of the DTD. Note that if a language is rich enough, it
doesn't have to change just because the subject area changes. This might be the
basis of the appeal of XSL and XML-Data which both use XML (more or less) to
describe their subject areas.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dcarlson at ontogenics.com Sun Feb 1 02:16:05 1998
From: dcarlson at ontogenics.com (Dave Carlson)
Date: Mon Jun 7 17:00:03 2004
Subject: problems with emacs xml-mode
Message-ID: <2.2.32.19980201021007.00e40c30@pop.dimensional.com>
At 05:57 PM 1/31/98 -0500, David Megginson wrote:
> > 2. The DTD is parsed, but all element names are folded into all lower case.
> > Does the current version of xml-mode support mixed-case element names? If
> > so, what am I doing wrong?
>
>Are you certain that you're using the latest version of the patches
>(from Fall 1997) and that you're actually in XML rather than SGML
>mode? Does it read 'XML' or 'SGML' in the mode bar at the bottom?
I'm using the xml-mode that I downloaded from your site in December 1997.
And, yes, it does read 'XML' in the mode bar. I'll try some additional
testing to see if I can narrow down the problem. Is there some other test I
can run to be sure I've got the entire xml- mode installed properly? I had
to do some manual hacking to install on WinNT, maybe I messed up somewhere.
I've never gotten it to work correctly, but sometimes I get the top-level
element names in mixed case, and the content model all folded to lower case.
So, I can add mixed case elements at the top level, but there are no "valid"
sub-elements because the content model has all tags in lower case. In
another test, everything was lower case.
> > 4. Font highlighting has some problems. I've configuring my _emacs file
> > according to earlier posts in this list, but the text highlighing only
> > appears after I've used the context menu to insert a new tag. Then, the
> > text is only highlighted from that point *backward* in the document. When I
> > first load a document, no text is highlighted.
>
>Again, this is not directly related to the XML patches. PSGML will
>highlight only the parts of the document that it has already parsed.
>In Unix, at least, it will eventually parse ahead and highlight the
>whole thing.
>
Yes, it will eventually highlight the entire document, once I've made an
addition to the end of the document.
Thanks for you help, and your contribution!
Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Sun Feb 1 13:57:12 1998
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 17:00:03 2004
Subject: SGML Architecture questions
Message-ID: <3.0.32.19980201074848.00c84c30@swbell.net>
At 03:46 AM 2/1/98 +0300, Dmitri Kondratiev wrote:
>I may be wrong, but from my understanding of SGML architecture, only
>bridging mechanism provides for type extension. Everything else in
>architecture seems to be element and attribute names remaping. Bridging
>element serves as a target for mapping substructure to it. Still bridging
>element is not defined in DTD and as a result its content/attributes can't
>be validated by parser. Is that correct ?
The bridging element *is* defined in the DTD, so it's use can be validated
by the parser, but your real question is:
>how validity constraints can be enforced for bridging element substructure ?
You do it locally in the document's own DTD, or you do it by deriving the
bridging element from another architecture.
>There is no DTD for element content so:
Yes there is: (#PCDATA | archbridge)*
However, you're point is that you might want to impose constraints on the
local (to this document) content of elements that map to archbridge. You
could define, locally, the content for the name element to match your
constraints:
]>
KimberWilliam
1234 Maple St.
AustinTX78757
You can also do it by deriving the bridging element from another architecture:
(This modifies the above declarations:)
This says that the cust.name element plays the role "name" within the
personarch architecture and the role "person-name" within the namearch
architecture. I can validate that cust.name satisfies the rules for "name"
as defined by the personarch and that its content satisfies the rules for
"person-name" in the namearch.
Notice how the cust.name element "bridges" from the personarch architecture
to the namearch architecture or from the architecture to the local
(document-specific rules).
Cheers,
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Sun Feb 1 17:16:07 1998
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 17:00:03 2004
Subject: First experiences with XSL
References:
Message-ID: <34D4AD72.49CE@hiwaay.net>
Betty Harvey wrote:
>
> On Sat, 31 Jan 1998, len bullard wrote:
>
> >
> > Can anyone show that XSL (if indeed, a Turing complete language) is any
> > easier
> > than Java? XSL is a programmig language and there are far more mortals
> > (programmers in some cases) who understand and can easily use Java than
> > XSL/DSSSL. Why? Object-oriented programming is the rule
> > not the exception in programming communities. JavaScript has a
> > tremendous
> > advantage in that stepping up to Java from JavaScript incurs no
> > shocks of syntax. It is an easy transition.
> >
>
> Len:
>
> My experience is it is XSL is easier. I was able to
> take the XSL tutorial and create a simple example of an
> XSL stylesheet.
>
> If you have Microsoft Explorer 4.0 or higher you can test my first
> example at: http://www.eccnet.com/xmledi.
>
> My initial thoughts are that it doesn't do everything I
> want it to do - but I am going to hold judgement until the XSL
> standard becomes more stable. Initially - I am impressed and
> looking forward to what XSL will offer us - thank goodness
> someone is not only thinking about style and behavior but
> moving towards a standard implementation effort - what
> FOSI tried to do 8 years ago.
>
> Betty
That is good to hear. Yet, the XSL/XLL discussion to me
has the feel of attending a summer stock presentation of
Hamlet: famous lines all carefully memorized, spoken
thousands of times before, and Hamlet still dies in the
last scene. Don't take it as a "I don't like XSL" but
a cautionary, "we know our parts so well we can sleepwalk
through them." So yes, compelling examples are needed.
The FOSI perished in complexity, HyTime has almost met
the same fate, and DSSSL never got out of the gate before
events and technology have overtook it.
We have to meet the criticism that XML technology is a
solution looking for a problem. We need something better
than the same defenses we presented for SGML/HyTime/DSSSL
to the same criticism.
I sense a deflation in the enamouring of the Web. Joe Q
Public has discovered the anemia of the infrastructure.
Still, experimental team efforts such as VRMLDream which
will demonstrate a puppeteering technology for virtual
theatre has promise. For these applications, it is 1945
and each TV network is a world unto itself. These groups
see the Internet as a broadcasting medium. Maybe Clinton
will survive his current problems and deliver on that
"1000X the bandwidth" promise. There is little doubt that
replacing the Internet infrastructure is needed ASAP.
Business interest is stable, yet the groups who control
the corporate standards are from printing backgrounds
and marketing. They see the Internet as a publishing
medium. They tend to be underwhelmingly technically
talented, aversive to technology whose practicioners they
do not control, and able to restrict the application at
the heart of the matter: funding. While the true
practicioner seeks to expand capability, the purse stringers seek to
restrict it and successfully. It is necessary to look
at the whole of the framework and how that can best meet
business needs, in content developement, maintenance,
production, and distribution. The architectures must
be sold accordingly. (one rung up the CALS spiral).
Beware jargon; beware complex examples,
beware precise description that fails to engender
imaginative application. The hook is the imagination.
Sink the hook to reel in the fish. Overall efficiency
is becoming the primary issue given the size
and bugginess of the framework. Building
evermore compelling and sustainable content is still
the goal. Just remember that many many groups do not
believe that putting long lifecycle information assets on
the WWW is a good thing to do. Find out why.
best,
len
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Feb 1 17:18:12 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:03 2004
Subject: JUMBO9801a1 release
Message-ID: <3.0.1.16.19980201170326.1a4f992a@pop3.demon.co.uk>
An updated version of the alpha JUMBO distribution (hopefully with the
earlier bugs removed) is available at:
http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/jan9801/jumbo9801a1.zip
This should supersede the earlier version.
The JUMBO in this distribution now runs as an APPLET as well as the
application described previously and you are welcome to experiment. Since
applets require classes to be 'under' the codebase, I have not tested the
SAX-compliant parsers; experiments and feedback is welcome. Note that some
of the text fields are no longer included in the distribution and should be
downloaded from the appropriate sites
As before I welcome gross errors (e.g. it doesn't run).
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Feb 1 20:43:19 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
Message-ID: <199802012028.PAA00747@unready.microstar.com>
As promised, I will now begin to summarise the requested changes to
SAX before we put out a stable 1.0 version: over the next few days, I
will send out one message summarising the requested changes to each
interface or class. For more information on SAX, see
http://www.microstar.com/XML/SAX/
There have been only two changes proposed to the Parser interface,
both of which would be backwards-compatible with existing
implementations:
1) Allow SAX to work with an input stream as well as a URI.
2) Simplify handler chaining by adding get* methods for existing
handlers.
Here are the change requests in detail, with my initial response at
the end of each one:
1) Allow SAX to work with an input stream as well as a URI.
- Paul Pazandak
- Peter Murray-Rust
- Don Park
Currently, the Parser interface provides only the following method
to initiate a parse:
void parse (String publicId, String systemId)
throws java.lang.Exception;
Following this suggestion, there would be a new method
void parse (String publicId, String systemId, InputStream input)
throws java.lang.Exception;
(It is still necessary to provide a system identifier for resolving
relative URIs within the stream). Note that the stream would be a
byte stream, not a character stream -- characters might require
more than one octet, depending on the encoding in use.
I can see the convenience of this method, and I plan to add
something like this to AElfred when I have a chance. For SAX,
however -- which is meant to end up as a language- and
system-independent API -- I am reluctant to hardcode assumptions
about storage (and I don't know enough about IDL to know if there
is a general representation for streams). Paul Pazandak has also
suggested allowing strings and buffers -- in this case, they would
already be decoded into characters.
Personally, I'm undecided, and would be interested in hearing the
theoretical arguments for and against this suggestion.
2) Simplify handler chaining by adding get* methods for existing
handlers.
- Don Park
Currently the Parser interface provides only setters for the
various handlers:
public void setEntityHandler (EntityHandler handler);
public void setDocumentHandler (DocumentHandler handler);
public void setErrorHandler (ErrorHandler handler);
Following this suggestions, there would also be accessors:
public EntityHandler getEntityHandler ();
public DocumentHandler getDocumentHandler ();
public ErrorHandler getErrorHandler ();
An application could then retrieve the existing handler and
implement a new one which invokes the old one under certain
circumstances.
This seems like a generally good idea (as will as a simple and
backwards-compatible change), and I am willing to implement it.
The only complication is that we'll have to define the default
state -- is the parser always required to return a default handler
if the user has not explicitly set one, or should it return null?
I look forward to your comments and suggestions.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Sun Feb 1 21:42:32 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
References: <199802012028.PAA00747@unready.microstar.com>
Message-ID: <34D4EE00.A1FCECF5@infinet.com>
> Here are the change requests in detail, with my initial response at
> the end of each one:
>
> 1) Allow SAX to work with an input stream as well as a URI.
>
> - Paul Pazandak
> - Peter Murray-Rust
> - Don Park
>
> Currently, the Parser interface provides only the following method
> to initiate a parse:
>
> void parse (String publicId, String systemId)
> throws java.lang.Exception;
>
> Following this suggestion, there would be a new method
>
> void parse (String publicId, String systemId, InputStream input)
> throws java.lang.Exception;
>
> (It is still necessary to provide a system identifier for resolving
> relative URIs within the stream). Note that the stream would be a
> byte stream, not a character stream -- characters might require
> more than one octet, depending on the encoding in use.
Well, what if the XML data is streamed from a database where a URL does not
matter so much. If you look at what Oracle, Sybase, and Microsoft among others
are planning on doing with XML, then supporting this with SAX in the most
ubiquitous way will be very much necessary. I think that if you want to make SAX
have any CORBA support or other language support down the line, it would be best
to negate any polymorphism in the API cause in CORBA for example, you cannot
redefine operations in IDL (methods in Java).
> I can see the convenience of this method, and I plan to add
> something like this to AElfred when I have a chance. For SAX,
> however -- which is meant to end up as a language- and
> system-independent API -- I am reluctant to hardcode assumptions
> about storage (and I don't know enough about IDL to know if there
> is a general representation for streams). Paul Pazandak has also
> suggested allowing strings and buffers -- in this case, they would
> already be decoded into characters.
Another idea (as far as implementation goes) is to have the parser simply be an
extension of java.io.FilterInputStream which takes an one or more Handler
interfaces as arguments (to delegate to), so that you can handle very large
streams of data. In addition to overriding the necessary
java.io.FilterInputStream methods, you can also have methods like readDocument(),
readElement(), etc. This would give people a lot more control over reading in
XML. This approach of course is similiar to how URL Content in the java.net
package handles content. But where I see this approach being most useful is in
transactions where you might only want to read in a limited amount of data
anyways and process only that or else in the case where XML content is always at
a fixed length (like in databases where you get null padding for string fields
which do not take up the assigned length). With the current SAX implementation,
you have no real control at the IO level where it would help to skip content if
the application feels it is necessary.
> Personally, I'm undecided, and would be interested in hearing the
> theoretical arguments for and against this suggestion.
>
> 2) Simplify handler chaining by adding get* methods for existing
> handlers.
>
> - Don Park
>
> Currently the Parser interface provides only setters for the
> various handlers:
>
> public void setEntityHandler (EntityHandler handler);
> public void setDocumentHandler (DocumentHandler handler);
> public void setErrorHandler (ErrorHandler handler);
>
> Following this suggestions, there would also be accessors:
>
> public EntityHandler getEntityHandler ();
> public DocumentHandler getDocumentHandler ();
> public ErrorHandler getErrorHandler ();
>
> An application could then retrieve the existing handler and
> implement a new one which invokes the old one under certain
> circumstances.
Not sure exactly what the use of these get methods is for cause all the handlers
are useful is delegation anyways. The only reason the get methods would be
useful is for casting the returned object to some other form. Why anyone would
need to do this is beyond me as recasting this object back to something would be
sloppy implementation in the first place.
> This seems like a generally good idea (as will as a simple and
> backwards-compatible change), and I am willing to implement it.
> The only complication is that we'll have to define the default
> state -- is the parser always required to return a default handler
> if the user has not explicitly set one, or should it return null?
The default handler could just be something which spits stuff out to stdout or
some other OutputStream in a manner similiar to how Aelfred's EventDemo does.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Sun Feb 1 22:36:19 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
References: <199802012028.PAA00747@unready.microstar.com> <34D4EE00.A1FCECF5@infinet.com>
Message-ID: <34D4FA9E.DFB80BAA@infinet.com>
Tyler Baker wrote:
> > I can see the convenience of this method, and I plan to add
> > something like this to AElfred when I have a chance. For SAX,
> > however -- which is meant to end up as a language- and
> > system-independent API -- I am reluctant to hardcode assumptions
> > about storage (and I don't know enough about IDL to know if there
> > is a general representation for streams). Paul Pazandak has also
> > suggested allowing strings and buffers -- in this case, they would
> > already be decoded into characters.
>
> Another idea (as far as implementation goes) is to have the parser simply be an
> extension of java.io.FilterInputStream which takes an one or more Handler
> interfaces as arguments (to delegate to), so that you can handle very large
> streams of data. In addition to overriding the necessary
> java.io.FilterInputStream methods, you can also have methods like readDocument(),
> readElement(), etc. This would give people a lot more control over reading in
> XML. This approach of course is similiar to how URL Content in the java.net
> package handles content. But where I see this approach being most useful is in
> transactions where you might only want to read in a limited amount of data
> anyways and process only that or else in the case where XML content is always at
> a fixed length (like in databases where you get null padding for string fields
> which do not take up the assigned length). With the current SAX implementation,
> you have no real control at the IO level where it would help to skip content if
> the application feels it is necessary.
One last thing I wanted to add to this which would be nice is if you had the Parser
be an extension of java.io.FilterInputStream or java.io.InputStream, would be for
being able to simple take a compressed XML file and unpack it all in one line of
code.
For example, you could create it all like this:
XMLInputStream xis = new XMLInputStream(new CompressedInputStream(in), handler);
where in, is any input stream (like file, URL, etc) and handler is one or more
handlers.
This I feel is much more flexible, since currently SAX only will accept content which
comes from a resolved URL as well as the fact that if you are going to have an
InputStream argument, you will need control over how it is handled. In addition, you
might want to be able to register the handler right before actually handling the
content. For example, if you get a systemID or publicID of some type (this would
currently occur with a doctype event in SAX), you would then want to register a
particular document handler with that type (which could be done nicely with a dynamic
class loading mechanism). In this case, you might have a static method in the
XMLInputStream class which acts as a registry for handlers of various document types
that could be something no more complex than a hashtable of class names which are
indexed by systemID or publicID. You could have this registry just be for documents,
or else it could even be more complex with a federated namespace of handlers for
elements.
Personally I would much rather write code that looks like this:
// Done when I initialize the program
java.util.Properties handlers = new java.util.Properties();
try {
handlers.load(new FileInputStream("foo.txt"));
} catch (IOException) {
e.printStackTrace();
}
XMLInputStream.registerHandlers(handlers);
// Then later do this
URL fooURL = new URL("http://www.foo.com/bar.xml");
XMLInputStream xis = new XMLInputStream(fooURL.openStream());
Or if you don't use any registry for document handlers, you could simply do something
like this
DocumentHandler bdh = new BarDocumentHandler();
// Assumes bar.xml is a document type "bdh" can handle
URL fooURL = new URL("http://www.foo.com/bar.xml");
XMLInputStream xis = new XMLInputStream(fooURL.openStream(), bdh);
Once you have the "xis" reference, then just call methods like "readDocument(Document
document)" which would read the document data into a Document object (Document would
be an interface).
Document document = new MSWord90Document();
try {
xis.readDocument(document)
} catch (IOException e) {
e.printStackTrace();
}
Personally I prefer the registry idea so you the application would know ahead of time
what to do for any XML file (handle it or else do some default handling).
Just some ideas before v1.0 of SAX in grinded in stone...
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Sun Feb 1 22:43:42 1998
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 17:00:03 2004
Subject: First experiences with XSL
References: <2.2.32.19980130155416.0085e27c@pop> <34D3C820.2671@hiwaay.net>
Message-ID: <34D4FA79.4BF7FA1F@allette.com.au>
len bullard wrote:
> Can anyone show that XSL (if indeed, a Turing complete language) is any easier
> than Java? XSL is a programmig language and there are far more mortals
> (programmers in some cases) who understand and can easily use Java than
> XSL/DSSSL.
I live in hope of the day when I finally see a file come out of a word processor
as XML, preceded by a DTD and an XSL style sheet. Rather than just regard XSL as
programming language, I would like to see it used as a common application
formatting syntax, as was tried with RTF. Assuming the users are going to do
pretty much whatever they want to as far as tagging is concerned (either for
legacy data or ongoing), conversion from one DTD to another will always be far
easier than conversion from an unstructured document to a structured one. This is
particularly true when you consider in current conversions how much structure is
implied from formatting characteristics (although this would presumably be
substantially diminished with more structured documents). From the perspective of
conversion of data (perhaps from a somewhat sloppy creation model to a more
concise storage model), a parseable, reasonably regular stylesheet would seem to
have advantages over Java.
Also, it may ultimately be desirable to produce an XSL document from some source,
interface or language that suits your individual needs better, thus XSL again
behaves as an interchange format. I think this fits well with the spirit of
XML/SGML.
> So, one might retreat to the defense of "But it is a standard" and there one
> would have a point.
There are other reasons, but the one you give above is also difficult to go past
:-)
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Feb 1 22:55:04 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:03 2004
Subject: Parser Interface -- Summary of Change Requests
Message-ID: <003b01bd2f63$c5b6c800$2ee044c6@donpark>
David,
>1) Allow SAX to work with an input stream as well as a URI.
...
> void parse (String publicId, String systemId, InputStream input)
> throws java.lang.Exception;
...
My suggestion would be to add following two methods to the EntityHandler
interface:
public InputStream
getEntityByteStream (String systemID)
throws Exception;
public InputStream
getEntityCharStream (String systemID)
throws Exception;
The parser implementation should invoke getEntityCharStream first to see if
the there is decoded data available. If not, it should invoke
getEntityByteStream to get the raw data.
If both methods return null, then default URL based code is used.
>2) Simplify handler chaining by adding get* methods for existing
> handlers.
...
> This seems like a generally good idea (as will as a simple and
> backwards-compatible change), and I am willing to implement it.
> The only complication is that we'll have to define the default
> state -- is the parser always required to return a default handler
> if the user has not explicitly set one, or should it return null?
It would be up to the SAX implementation. It might provide default
implementation depending on configuration. For example, FooSaxDriver might
have setInputType() method which would install a default EntityHandler for
fetching XML document from a database.
BTW, You left out my other suggestion which was
>>>>>>>>>>>>>>>>>>>>>>>>
In addition, I would like to have following two methods added to the Parser
API for driver-specific operations:
public Object getDriverProperty(String name);
public Object setDriverProperty(String name, Object value);
Property names should be prefixed with some unique values to avoid confusing
other drivers. Note that above methods can be invoked without knowing which
driver is actually being used. For example:
parser.setDriverProperty("SuperDriver.lowercaseElements", Boolean.TRUE);
parser.setDriverProperty("HungryDriver.cacheSize", new Integer(100000));
<<<<<<<<<<<<<<<<<<<<<<<<
Above two methods allow driver-specific code without actually having to
import anything.
Regards,
Don Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Feb 1 23:11:06 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
Message-ID: <006601bd2f66$08ce9d50$2ee044c6@donpark>
>Not sure exactly what the use of these get methods is for cause all the
handlers
>are useful is delegation anyways. The only reason the get methods would be
>useful is for casting the returned object to some other form. Why anyone
would
>need to do this is beyond me as recasting this object back to something
would be
>sloppy implementation in the first place.
get methods are for chaining delegations possible as well as allowing the
drivers to provide more functional default handlers without worrying about
having them blasted out of the water just because the application wants to
override the handler. It is beyond me as to why anyone would cast the
returned object to some other form whether such practice is sloppy or not.
Please enlighten me.
>The default handler could just be something which spits stuff out to stdout
or
>some other OutputStream in a manner similiar to how Aelfred's EventDemo
does.
I don't think customers will appreciate having stdout or whatever filling
screen or disk with SAX event messages. Internet Explorer with java logging
enabled would cause a hiccup.
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Feb 2 20:55:07 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
In-Reply-To: <34D4EE00.A1FCECF5@infinet.com>
References: <199802012028.PAA00747@unready.microstar.com>
<34D4EE00.A1FCECF5@infinet.com>
Message-ID: <199802022050.PAA01517@unready.microstar.com>
Tyler Baker writes:
[on reading XML from a stream rather than a URI]
> Well, what if the XML data is streamed from a database where a URL
> does not matter so much. If you look at what Oracle, Sybase, and
> Microsoft among others are planning on doing with XML, then
> supporting this with SAX in the most ubiquitous way will be very
> much necessary. I think that if you want to make SAX have any
> CORBA support or other language support down the line, it would be
> best to negate any polymorphism in the API cause in CORBA for
> example, you cannot redefine operations in IDL (methods in Java).
This is a good point, but there are complications. Do these vendors
plan to use character streams or byte streams?
> Another idea (as far as implementation goes) is to have the parser
> simply be an extension of java.io.FilterInputStream which takes an
> one or more Handler interfaces as arguments (to delegate to), so
> that you can handle very large streams of data.
This sounds like an interesting idea for a parser implementation, but
since SAX is meant to work with many parsers in many languages, it is
probably too constraining as a general common interface.
[on get* methods for handlers]
> Not sure exactly what the use of these get methods is for cause all
> the handlers are useful is delegation anyways. The only reason the
> get methods would be useful is for casting the returned object to
> some other form. Why anyone would need to do this is beyond me as
> recasting this object back to something would be sloppy
> implementation in the first place.
Delegation itself might be enough justification, though -- we'll have
to wait and see what others suggest.
> The default handler could just be something which spits stuff out
> to stdout or some other OutputStream in a manner similiar to how
> Aelfred's EventDemo does.
It would probably be best for the default handler to produce no output
at all, so that other handlers delegating to it would not end up
creating bloated log files.
All the best, and thanks for the feedback,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Feb 2 21:04:04 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:03 2004
Subject: Parser Interface -- Summary of Change Requests
In-Reply-To: <003b01bd2f63$c5b6c800$2ee044c6@donpark>
References: <003b01bd2f63$c5b6c800$2ee044c6@donpark>
Message-ID: <199802022059.PAA01592@unready.microstar.com>
Don Park writes:
> public InputStream
> getEntityByteStream (String systemID)
> throws Exception;
>
> public InputStream
> getEntityCharStream (String systemID)
> throws Exception;
>
> The parser implementation should invoke getEntityCharStream first to see if
> the there is decoded data available. If not, it should invoke
> getEntityByteStream to get the raw data.
>
> If both methods return null, then default URL based code is used.
I like the general idea, though there are implementation problems.
Many languages (including Java 1.0.2) have no concept of a character
stream at all, and in Java 1.1, you would have to use
public Reader getEntityCharStream (String systemID)
throws Exception;
> > This seems like a generally good idea (as will as a simple and
> > backwards-compatible change), and I am willing to implement it.
> > The only complication is that we'll have to define the default
> > state -- is the parser always required to return a default handler
> > if the user has not explicitly set one, or should it return null?
>
> It would be up to the SAX implementation. It might provide default
> implementation depending on configuration. For example, FooSaxDriver might
> have setInputType() method which would install a default EntityHandler for
> fetching XML document from a database.
This might make life a little trickier for programmers using SAX --
what do others think?
> BTW, You left out my other suggestion which was
>
> >>>>>>>>>>>>>>>>>>>>>>>>
> In addition, I would like to have following two methods added to the Parser
> API for driver-specific operations:
>
> public Object getDriverProperty(String name);
> public Object setDriverProperty(String name, Object value);
>
> Property names should be prefixed with some unique values to avoid confusing
> other drivers. Note that above methods can be invoked without knowing which
> driver is actually being used. For example:
>
> parser.setDriverProperty("SuperDriver.lowercaseElements", Boolean.TRUE);
> parser.setDriverProperty("HungryDriver.cacheSize", new Integer(100000));
> <<<<<<<<<<<<<<<<<<<<<<<<
>
> Above two methods allow driver-specific code without actually having to
> import anything.
Sorry about the omission. I'd be interested in hearing other
reactions to this suggestion -- I'm worried that it would result in
SAX implementations that are non-conformant XML processors (as in your
first example), or that are incompatible with each other. Remember
that SAX defines only a minimum level of compatibility among XML
processors.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Mon Feb 2 21:52:23 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
References: <199802012028.PAA00747@unready.microstar.com>
<34D4EE00.A1FCECF5@infinet.com> <199802022050.PAA01517@unready.microstar.com>
Message-ID: <34D63FDE.2D234CFC@infinet.com>
David Megginson wrote:
> Tyler Baker writes:
>
> [on reading XML from a stream rather than a URI]
>
> > Well, what if the XML data is streamed from a database where a URL
> > does not matter so much. If you look at what Oracle, Sybase, and
> > Microsoft among others are planning on doing with XML, then
> > supporting this with SAX in the most ubiquitous way will be very
> > much necessary. I think that if you want to make SAX have any
> > CORBA support or other language support down the line, it would be
> > best to negate any polymorphism in the API cause in CORBA for
> > example, you cannot redefine operations in IDL (methods in Java).
>
> This is a good point, but there are complications. Do these vendors
> plan to use character streams or byte streams?
In CORBA IDL there is a string and a wstring type. The wstring type maps to
Unicode in the IDL -> Java mapping. You could define everything as wstring if
you wish as far as IDL is concerned.
> > Another idea (as far as implementation goes) is to have the parser
> > simply be an extension of java.io.FilterInputStream which takes an
> > one or more Handler interfaces as arguments (to delegate to), so
> > that you can handle very large streams of data.
>
> This sounds like an interesting idea for a parser implementation, but
> since SAX is meant to work with many parsers in many languages, it is
> probably too constraining as a general common interface.
Yah I only meant as for the implementation, but on another note, I think that the
Handler interfaces are by far and away the most important ones. Really, if
Aelfred had an XMLInputStream which could be derived out of Parser either by
having the parser be an implementation of XMLInputStream itself, or else
assigning a parser stub to XMLInputStream which could be retrieved by calling,
Parser.getXMLInputStream(). Parser.parse() would just parse everything with no
control over IO, but with XMLInputStream you could have control at the IO level
Furthermore, having a handler registry of SAX Handler interfaces (or just
pointers to where the class implementations live) would be invaluable to the
particular application I am working on now. I suggested having a static
registerHandler method in XMLInputStream, but you could add this to Parser
instead. This way you could simply pass in XML data and the parser would look up
the appropriate handler implementation for that doctype and load it dynamically.
Otherwise, this needs to be done manually and can really bloat your code at the
application level since you will have to essentially have a large number of
if/else statements and register the appropriate handlers manually. If this was
implemented in Aelfred or any other parser, you would already remove a huge
burden off of the application developers utilizing XML IMHO.
> [on get* methods for handlers]
>
> > Not sure exactly what the use of these get methods is for cause all
> > the handlers are useful is delegation anyways. The only reason the
> > get methods would be useful is for casting the returned object to
> > some other form. Why anyone would need to do this is beyond me as
> > recasting this object back to something would be sloppy
> > implementation in the first place.
>
> Delegation itself might be enough justification, though -- we'll have
> to wait and see what others suggest.
I think it would be better to have an addDocumentHandler() instead of
setDocumentHandler() if you wish to do delegation. This is an
Observer/Observable pattern that would work quite nicely. You could have
multiple objects register interest in the parsing of the XML data and have the
events delivered to them appropriately. You might even make all of this beans
compliant if you really want to.
> > The default handler could just be something which spits stuff out
> > to stdout or some other OutputStream in a manner similiar to how
> > Aelfred's EventDemo does.
>
> It would probably be best for the default handler to produce no output
> at all, so that other handlers delegating to it would not end up
> creating bloated log files.
Yah, I kinda overlooked this. I just thought it would be nice for debugging. My
stupid (-:
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Feb 2 22:23:17 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:03 2004
Subject: First experiences with XSL
In-Reply-To: <01bd2d90$7dc5d6a0$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID:
On Fri, 30 Jan 1998, Michael Kay wrote:
> I've downloaded MSXSL and used it to generate HTML for a couple of document
> types, successfully but with a certain amount of frustration caused by (a)
> lack of diagnostics when I got things wrong, and (b) limited functionality.
>
> I've now implemented the same thing without XSL: I wrote an MSXML
> application in Java that does a recursive walk down the document tree and
> calls a registered "handler" class to process each element type.
Yes, you can implement something XSLish without XSL. The point of XSL is
that it is to be a standard: there will be multiple, interoperable
browser and word processor implementations as well as dedicated XSL
development tools and so forth.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From oshima at osa.sci.jri.co.jp Tue Feb 3 05:30:09 1998
From: oshima at osa.sci.jri.co.jp (Tetsuya OSHIMA)
Date: Mon Jun 7 17:00:03 2004
Subject: No subject
Message-ID: <9802030238.AA13691@t111ws06>
# bye
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Feb 3 10:58:02 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
Message-ID: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk>
>Tyler Baker writes:
>
> [on reading XML from a stream rather than a URI]
>
> > Well, what if the XML data is streamed from a database where a URL
> > does not matter so much...
This suggests an analogy with CGI. A URL is not the name of a document, it
is a request for a stream of data, and what we need is a style of URL (or
extended URL) that allows the application to say "please send your requests
for data to me and I will supply a stream in response".
>This is a good point, but there are complications. Do these vendors
>plan to use character streams or byte streams?
>
I don't know the Java technicalities, but surely what we mean by a stream
here is something that supplies a sequence of Unicode characters. (Surely
it's
not the parser's job to turn bytes into characters?)
We should also ensure that the design makes certain special cases easy for
the application writer, e.g.:
a) the primary input source is a file in filestore. (Translating the
filename to a URL is error-prone and it would be better for the parser to do
it)
b) there is only one input source (e.g. a record containing XML read from a
database, with no DTD or other external entities), probably available
already in the application as the contents of a String.
regards, Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 3 12:00:17 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:03 2004
Subject: SAX: Parser Interface -- Summary of Change Requests
In-Reply-To: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk>
References: <01bd3092$b63188e0$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <199802031155.GAA00333@unready.microstar.com>
Michael Kay writes:
> I don't know the Java technicalities, but surely what we mean by a stream
> here is something that supplies a sequence of Unicode characters. (Surely
> it's
> not the parser's job to turn bytes into characters?)
That depends on the type of stream. I would not want to force the
client to do encoding conversion for a stream that happened to be open
to a local file or an HTTP connection.
> We should also ensure that the design makes certain special cases easy for
> the application writer, e.g.:
>
> a) the primary input source is a file in filestore. (Translating the
> filename to a URL is error-prone and it would be better for the parser to do
> it)
>
> b) there is only one input source (e.g. a record containing XML read from a
> database, with no DTD or other external entities), probably available
> already in the application as the contents of a String.
It should be possible to read from a string, but it would not be safe
to assume that the string contains no DTD or external entities -- it
would always be necessary to supply a base URI as well.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From thyde-smith at derwent.co.uk Tue Feb 3 12:19:04 1998
From: thyde-smith at derwent.co.uk (thyde-smith@derwent.co.uk)
Date: Mon Jun 7 17:00:03 2004
Subject: UNSUBSCRIBE
Message-ID: <00010D05.1271@derwent.co.uk>
unsubscribe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From nav at metratech.com Tue Feb 3 16:02:32 1998
From: nav at metratech.com (Navdip Bhachech)
Date: Mon Jun 7 17:00:03 2004
Subject: recommendations on currently available streaming XML toolkits?
Message-ID: <01BD3093.3C881940.nav@metratech.com>
there have been a few discussions on streaming issues in this list
lately, so I thought I'd ask:
What are the recommended toolkits (currently available) that allow
streaming XML, instead of a file based approach?
Nav
______________________________________________________________
Navdip Bhachech
MetraTech Corp
www.MetraTech.com
nav@metratech.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Tue Feb 3 16:51:48 1998
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun 7 17:00:04 2004
Subject: recommendations on currently available streaming XML toolkits?
In-Reply-To: Navdip Bhachech's message of Tue, 3 Feb 1998 11:02:35 -0500
References: <01BD3093.3C881940.nav@metratech.com>
Message-ID:
Our XML tools are designed for streaming, and are happy with multi-10M
documents:
http://www.ltg.ed.ac.uk/software/xml/
ht
--
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Feb 3 22:29:04 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:00:04 2004
Subject: Ideas about Cutting and Pasting in XML
Message-ID: <199802032238.JAA18643@jawa.chilli.net.au>
Developers with an idle moment may be interested in a paper I've
just put up "A Cut and Paste Infrastructure for XML"
http://www.chilli.net.au/~ricko/XML-cut-n-paste.htm
It gives a direction I suggest XML needs to be developed towards,
in order to support arbitrary cutting and pasting between XML
documents.
This now has some comments about RDF (and XML-data) which may
be of interest too.
Comments welcome.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bsteele at tdiinc.com Tue Feb 3 22:36:23 1998
From: bsteele at tdiinc.com (Bob Steele)
Date: Mon Jun 7 17:00:04 2004
Subject: XML-Data: A naive question
Message-ID: <34D79CFD.4F0469F@tdiinc.com>
RDF documentation (Resource Description Framework (RDF) Model and
Syntax) states:
"RDF uses the Extensible Markup Language (XML) encoding as its syntax.
However, RDF will not require (and conforming implementations must not
require) an XML Document Type Declaration for the contents of
assertions. In this respect RDF requires at most the XML well-formedness
constraints. RDF schemas may ? but are not required to ? be XML DTDs."
Isn't this true of XML-Data? I can't seem to find it expressly stated.
Thanks,
bob
--
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Feb 3 23:07:45 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:04 2004
Subject: XML Conformance and DTD support in SAX
Message-ID: <003e01bd30f7$e0e6a830$2ee044c6@donpark>
1. XML Conformance
I am not sure if I am going off in a tangent but I think some form of markup
to indicate XML conformance would be really nice so that XML clients and
servers can decide whether to validate or not.
2. It would be nice to have SAX provide more DTD information.
We could either have a separate DocumentTypeHandler or fire XML parsing
events for DTD as if it was an XML document being parsed. Anyway, without
better support for DTD, DOM can be supported fully by SAX. Perhaps we need
SAXDTD API to augment SAX?
No lines drawn, just digging some sand with my toes,
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 4 00:35:18 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:04 2004
Subject: XML Conformance and DTD support in SAX
In-Reply-To: <003e01bd30f7$e0e6a830$2ee044c6@donpark>
References: <003e01bd30f7$e0e6a830$2ee044c6@donpark>
Message-ID: <199802040029.TAA00528@unready.microstar.com>
Don Park writes:
> 2. It would be nice to have SAX provide more DTD information.
>
> We could either have a separate DocumentTypeHandler or fire XML parsing
> events for DTD as if it was an XML document being parsed. Anyway, without
> better support for DTD, DOM can be supported fully by SAX. Perhaps we need
> SAXDTD API to augment SAX?
I think that it is very likely that we will make a SAX level two some
other day, which might include a DocumentHandler and/or a DTDHandler
interface. For now, however, we should probably try to stabilise what
we have -- the current SAX falls mostly within the range of features
already offered by existing parsers.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Feb 4 08:50:48 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:04 2004
Subject: Namespaces, modules and architectures paper available
Message-ID: <34D82C2E.6C6B3AE7@technologist.com>
http://itrc.uwaterloo.ca/~papresco/sgml/namespaces.html
Why We Need Namespaces (Modules)
An SGML/XML Feature Proposal
Abstract
The World Wide Web Consortium has recently published a note called
Namespaces in XML. Not everyone has access to it yet, but they will
soon. It proposes a simple convention for allowing instances to have
elements whose type names come from many different schemas. According to
that note:
"We envision applications of XML in which a document instance may
contain markup defined in multiple schemas. These schemas may have been
authored independently. One motivation for this is that writing good
schemas is hard, so it is beneficial to reuse parts from existing,
well-designed schemas. Another is the advantage of allowing search
engines or other tools to operate over a range of documents that vary in
many respects but use common names for common element types."
Advocates of ISO architectural forms ("archforms") have noticed that
these requirements are very similar to those for archforms and have
proposed archforms as a solution. They are correct that the basic
underlying problems are related, but the problems are not identical. We
need both archforms and namespaces. The two ideas are actually very
complementary. This note demonstrates why neither architectural forms
nor the current namespace proposal really solve the "namespace problem"
satisfactorily.
Background
I will use the document [1]'A Proposal to Introduce "Module" Structures
Into SGML' as an example of a modules proposal which includes not just a
convention for namespace combination, but a syntax for actually
combining SGML DTD fragments. These fragments are the only standardized
schema for either SGML or XML.
Architectural forms allow a "client document" to declare that certain
elements conform to an element type in a DTD other than the document's
DTD. For instance you could say that a particular element is both a LINK
element in the document's DTD and a HyTime CLINK element in the HyTime
architecture. It is essentially both things at once. You can either
declare a particular element as having an architectural element type (in
addition to its ordinary element type) or you can declare that all of
the elements of a particular type adhere to a particular architectural
element type. For instance you could say that a particular "human"
element conforms to the "animal" architectural element type (if the
human was, for example, a "party animal") or you could say that all
"dog" elements conform to the "animal" architectural element type.
The Rub
A particular element can also conform to multiple architectural element
types. For instance the afore mentioned human could conform to both the
"programmer" and the "party animal" architectural element types (no,
those are not logically exclusive). My claim is that this increased
generality is a powerful feature in many contexts, but makes things way
too complex in the simple case for architectural forms to be the most
basic namespace management facility in XML. SGML and SGML tools are
organized around the idea that each element conforms to one and only one
element type. We have not yet re-thought the SGML processing idea in
terms of the concept of multiple element types.
For instance, the most common form of SGML processing is validation.
SGML uses DTDs to define constraints on SGML documents. According to the
Japanese proposal, validation could be accomplished less like this:
]>
Imagine that math.module.dtd and hyperlinks.module.dtd are hundreds of
lines long. Imagine also that they both had an element called "SET" (for
"mathematical set" and "link set"). As far as I know, there is no way to
accomplish this namespace merging operation with anything close to the
same ease with architectural forms. Yes, I can do it, by copying
math.module.dtd and hyperlinks.module.dtd into my document type. I can
then manually fix up the namespace clashes like my "SET" element. But it
is this sort of duplication of code that the modules proposal was
explicity designed to avoid. In fact, that is it's reason for existing.
We can see, then, that architectural forms do not solve the problem that
the modules proposal was meant to solve. They do not automatically merge
namespaces.
Let me define some terms to clarify. A namespace is a mapping from names
to objects, such as element type names to element types (explicitly or
implicitly declared). A namespace merge is the construction of a
namespace from two others that preserve all of the elements from the
originals. Architectural forms provide access to multiple namespaces,
but they do not merge namespaces.
I suspect that some with a long background in SGML will be a little
baffled trying to understand why someone would want to do this. After
all, combining document types is typically difficult work performed by
experts, tested on teams of users, tweaked to perfection with element
names remapped to fit the terminology of the user community. Mixing and
matching DTD fragments in an ad hoc manner might not seem like a good
idea. But the fact is that we live in a brave new world. End users want
to take control of their own document types in many cases. They want to
mix and match DTD fragments and they are not willing to spend the amount
of effort that we professionals are. Good for them! They will make all
of our lives easier. In fact, when authors say that they want to "get
rid of" DTDs, what they typically mean is that they don't want to be
constrained by someone else's DTD and making their own is too difficult!
If we can make DTD maintenance easier, more people will use them.
Perhaps it would be possible update SGML that validation does not depend
so deeply on each element having a single element type, so that content
models could be expressed that combined elements from different
architectures. If we did that, my complaint might go away. Architectures
might regain some of the validatory simplicity of the modules proposal.
But this would require a much more fundamental change to SGML than the
modules proposal would.
Stylesheets
I will use stylesheets as another example of processing. The three most
interesting stylesheet languages right now are DSSSL, XSL and CSS. Each
of those has as its central organizing construct a rule triggered on an
element type name in a context. DSSSL has a feature that would allow
querying on architecture, but the feature is optional and is not
supported, for instance, by James Clark's Jade. Even where the feature
is available, the architectural form-based version of a stylesheet is
much more complicated than the equivalent based on a "flat" namespace
(such as a stylesheet for tradition SGML or SGML augmented with the
modules proposal). I invite architectural forms advocates to prove me
wrong by providing their stylesheets.
Here is what a module-enhanced DSSSL might look like:
(element MATH.AND.HYPERLINKS (process-children))
As you can see, this has just enough lines to include the relevant
stylesheet modules and provide rules for the new elements. What would
the equivalent archform code look like? With DSSSL as it exists, it
would look quite ugly and convoluted. With some enhanced DSSSL it might
look reasonable (just as some enhanced SGML might be able to have
content models that span architectures), but nobody has yet proposed
what such a DSSSL would look like (just as nobody has proposed the
enhanced SGML). I am open to suggestions...
I do not believe that either the current XSL proposal or CSS would allow
architecture based processing at all. Once again, the idea that every
element has a single element type is a fundamental organizing principle
of these stylesheet languages. It is also an organizing principle of
most SGML editors, DTD editors and formatting and conversion tools I
have used. In fact, almost every SGML tool in the world operates under
that principle. The best tools will give you access to architectural
forms (through their architectural attributes), but they will typically
use the element type name as the major organizing feature of the
stylesheets. Archform centric processing is typically awkward if it is
possible at all.
The one element, one elment type principle is also central to every
course in SGML I have ever taken and any book on it I have ever read.
Even the SGML Handbook says that every element has a particular element
type (a single, particular element type).
The Argument From Usability
Imagine that you are a typical end user and have used archforms instead
of a namespace merging mechanism to combine DTD fragments. Now imagine
that you know that a particular element type name appears in both DTD
fragments. I think that most people would be very surprised to learn
that the way to associate this element with one or the other DTD is to
add an attribute. Because the generic identifier (the name in the
start-tag) usually establishes the element type, you would probably
expect to change the generic identifier to change the association. But
using architectural forms, you would actually rather have to add an
attribute that would essentially disassociate the element with one of
the element types: "I may have the same name as that element type, but
it isn't actually one of my element types." I think that this is a nasty
case of making the common, simple case of merging DTD fragments more
complicated in order to make life easier for those of us who have to
solve problems that may actually require the full generality of
architectural forms. Once again, I invite advocates to send me code
samples that demonstrate that this is simpler than I think.
Who was it that said: "Make the easy things easy and the hard things
possible." Architectural forms make hard things possible, but when
misapplied to the namespace problem, they make easy things unnecessarily
hard. Le me be clear: architectural forms (or something like them) have
an important role to play in SGML systems. We absolutely need some form
of semantic inheritance mechanism. But they work best when they work in
the environment they were designed for: they are typically used as an
underlying basis of a DTD designed by a professional. The professional
DTD designer renames elements to avoid clashes. That individual is the
real solution to the "namespace problem" in most environments. In
environments where such a person exists, archforms are really, really
useful. They are not useful because they allow you to merge namespaces
(they don't). They are useful because they allow you to combine
semantics from different DTD fragments in powerful ways (but more or
less manually). I think that a modules/namespaces proposal would
acutally be very useful for building architectures from DTD fragments. I
also think that architectural forms would be very useful on the Web. Not
every use of XML on the web will be ad hoc. Some XML applications will
need the robust multi-level validation that architectural forms allow.
Think about e-commerce for example.
But many users will not need or want architectural forms. Most people
just need a simple way to combine fixed DTD fragments so that there are
no name clashes. The Japanese module proposal provides such a mechanism.
Presumably Web-centric DTD-replacement schema languages will provide
mechanisms like this also. If these sorts of things are made much easier
in these schema languages than they are in SGML DTD syntax, people will
just avoid SGML DTD syntax. This would be a big mistake for all
concerned. Let's please just fix SGML through a proposal like the one
submitted by the Japanese in 1996. Some modules proposal should be part
of the SGML revision. This would in no way preclude the wide deployment
of architectural forms as a solution to a different problem.
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Feb 4 14:36:48 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:04 2004
Subject: Namespaces, modules and architectures paper available
References: <34D82C2E.6C6B3AE7@technologist.com> <34ee5fa9.103270755@mail.alink.net>
Message-ID: <34D87D02.14BA4B9C@technologist.com>
I appreciate the simplicity of this [1]proposal, but want to check that
it is not too simple to get the job done.
How would you pass information into a module with this proposal? For
instance, I might want to include a table model, but might need to
specify the contents of the table's cell elements from the containing
DTD.
Also, it feels "nicer" to me to have the instance structure control
namespace lookup so that when I am in a MATH::FORMULA element, I can use
elements from the MATH module without qualification. This convention
could remove most or all qualification from a document instance and thus
make things simpler for authors. For instance:
%math;
]>
...
I would like it if the containing element would control namescope
choice.
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
[1] It should appear here soon:
http://www.lists.ic.ac.uk/archives/xml-dev/9802/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 4 15:07:03 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:04 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <34D87D02.14BA4B9C@technologist.com>
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
Message-ID: <199802041506.KAA00956@unready.microstar.com>
It seems to me that when you want to embed large contiguous structures
from different document types in an XML document, each different
namespace should be its own sub-document, referenced as a binary
entity (or using whatever other mechanisms are available in XML-Link).
Good tools and protocols should make it possible to create, transmit,
and process compound documents as if they were single files. This
will be necessary anyway for supporting multimedia.
Here are some general guidelines:
* Architectural forms are most suitable for applications where
multiple inheritance is required, or where elements belonging to a
different document type are scattered throughout a document.
* Sub-documents are most suitable for applications where all of the
element belonging to a different document type are rooted in a
single subtree.
"namespace:gi" element type names are unsuitable for several reasons:
1) The complexity of namespaces is exposed to the author rather than
hidden in the DTD (as it is, optionally, with architectural forms).
2) Multiple inheritance is not possible (X can be a kind of Y or a
kind of Z, but not both).
3) Standard DTD-based validation is not possible, and it is more
difficult to create DTD-driven authoring tools.
4) Both architectural forms and sub-documents can be fully supported
under the existing spec by _both_ validating and non-validating XML
parsers: no changes necessary. Furthermore, they will also remain
compatible with SGML tools.
Why are people worried about writing specs to solve a problem that
already has good, working, available solutions?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From grk at arlut.utexas.edu Wed Feb 4 15:59:35 1998
From: grk at arlut.utexas.edu (Glenn R. Kronschnabl)
Date: Mon Jun 7 17:00:04 2004
Subject: FORTRAN namelist input - remember? Replace with XML!
Message-ID: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu>
I want to use XML as a general input mechanism for scientific programs. In
the old days, say in FORTRAN, one used to use namelist input. In C/C++, one
usually wrote a custom driver. I want to use XML because it appears to make
sense. I have started using SP - and want to build a tree that I can query
(kind of like an xrdb interface) for my input parameters. But, before I
embark on this, I was wondering if 1) this makes sense, 2) someone surely has
a simple tree builder/query interface to SP already that I can use so I don't
have to write my own (none jumped out at me when I looked around).
Thanks.
Cheers,
Glenn
--------------------
Glenn R. Kronschnabl
Applied Research Laboratories | grk@arlut.utexas.edu (PGP/MIME ok)
The University of Texas at Austin | http://www.arlut.utexas.edu/~grk
PO Box 8029, Austin, TX 78713-8029 | (Ph) 512.835.3642 (FAX) 512.835.3808
10,000 Burnet Road, Austin, TX 78758 | ... but an Aggie at heart!
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Wed Feb 4 16:29:11 1998
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 17:00:04 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <199802041506.KAA00956@unready.microstar.com> (message from David
Megginson on Wed, 4 Feb 1998 10:06:58 -0500)
Message-ID: <199802041632.LAA14809@geode.ora.com>
[David Megginson]
> "namespace:gi" element type names are unsuitable for several reasons:
[...]
> Why are people worried about writing specs to solve a problem that
> already has good, working, available solutions?
The problem (as I see it) is not one of including pieces of existing
documents, nor of structural validation. The main reason for
namespaces is semantic inheritance. I want to write a scientific
research paper quickly. HTML has the overall document structure and
components that I need; MathML has equations; CML has chemical
formulæ. I should be able to say that I'm using those things,
associate stylesheets, and have my browser know that should
be styled with the "a" rule from the HTML stylesheet.
It should be *possible* to create a DTD to which such a document
complies, but I am not as interested in automatic validation of a
namespace document. The interrelational issues are, I think, too
complex to solve; in the example above, I would need to change the
text-containing HTML elements' content models to include chemical and
mathematical markup, and maybe allow HTML markup in MathML theorems.
Pushing selected information into the content models is too ugly.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 4 17:34:27 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:04 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <199802041632.LAA14809@geode.ora.com>
References: <199802041506.KAA00956@unready.microstar.com>
<199802041632.LAA14809@geode.ora.com>
Message-ID: <199802041733.MAA02120@unready.microstar.com>
Chris Maden writes:
> The problem (as I see it) is not one of including pieces of existing
> documents, nor of structural validation. The main reason for
> namespaces is semantic inheritance. I want to write a scientific
> research paper quickly. HTML has the overall document structure and
> components that I need; MathML has equations; CML has chemical
> formul?. I should be able to say that I'm using those things,
> associate stylesheets, and have my browser know that should
> be styled with the "a" rule from the HTML stylesheet.
It seems to me simpler to create a compound document rather than to
try to force everything into a single XML document -- you can
reference another XML document the same way that you can include a
graphic or audio sequence. Managing a lot of small objects directly
on the file system can be tricky, but it's trivial with proper tool
support (think of OLE under Windows, despite its warts)
> It should be *possible* to create a DTD to which such a document
> complies, but I am not as interested in automatic validation of a
> namespace document. The interrelational issues are, I think, too
> complex to solve; in the example above, I would need to change the
> text-containing HTML elements' content models to include chemical and
> mathematical markup, and maybe allow HTML markup in MathML theorems.
> Pushing selected information into the content models is too ugly.
Not at all -- you just need a single element type to hold references
to other XML documents. You could even (though this is disgusting)
use
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Wed Feb 4 18:01:44 1998
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 17:00:04 2004
Subject: [AElfred] Problem: '"' in CDATA attribute
Message-ID: <2.2.32.19980203180236.0095ec44@dream.paragraph.com>
AElfred distribution from 19980112.
Problem:
com.microstar.xml.XmlProcessor.error() reports error when parsing attribute
declared in DTD as CDATA and containing '"' in its value, such as "#text".
On the other hand com.microstar.sax.AElfredDriver from the same 19980112
distribution handles attribute definition corectly and doesn' report such an
error.
Dima
---------------------------
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Feb 4 18:05:50 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com> <199802041506.KAA00956@unready.microstar.com>
Message-ID: <34D8AE13.610ABC07@technologist.com>
David Megginson wrote:
>
> It seems to me that when you want to embed large contiguous structures
> from different document types in an XML document, each different
> namespace should be its own sub-document, referenced as a binary
> entity (or using whatever other mechanisms are available in XML-Link).
>
> Good tools and protocols should make it possible to create, transmit,
> and process compound documents as if they were single files. This
> will be necessary anyway for supporting multimedia.
*MAKE EASY THINGS EASY*
Making my five-line formula into a different document with a different
document type is *not easy*. It is a royal pain in the butt, which is
why almost nobody does it. I have seen the CALS table model merged with
dozens of DTDs and have never once seen someone take the opposite
approach of making CALS tables "subdocuments."
We can imagine a theoretical universe in which the tools are so good
that this is easy, but if we are imaginative in this way, we can paper
over any design flaw in SGML or XML with the claim that "the tools can
handle it." If XML or SGML were designed to be manipulated only through
tools, that would be acceptable. But they were not...they were designed
to be written in text editors and surprising enough, a huge number of
people do that.
> Here are some general guidelines:
>
> * Architectural forms are most suitable for applications where
> multiple inheritance is required, or where elements belonging to a
> different document type are scattered throughout a document.
I agree with the former. I don't with the latter. A simple modules
proposal handles the latter nicely.
> * Sub-documents are most suitable for applications where all of the
> element belonging to a different document type are rooted in a
> single subtree.
Subdocuments have many problems including
* typing convenience (seperate files...yuck)
* element type constrainability (how do I specify a SUBDOC root element
type in a content model?)
* "content model communication" (how do I pass a %cell; content model
into my table subdoc)
* modularity (subdocs must be declared at the top of the document, an
annoying non-local maintenance issue)
* ID linkage (even for simple links I must use some more advanced
linking strategy)
* semantics (i.e. SUBDOC has none...you need VALUEREF or something else
on top of subdoc)
That does not mean that they are never useful. There are some hard
problems where they are very useful. But for the *simple problem* of
embedding MATH in HTML (for example) they are overkill, as are
architectural forms. *KEEP SIMPLE THINGS SIMPLE*
> "namespace:gi" element type names are unsuitable for several reasons:
>
> 1) The complexity of namespaces is exposed to the author rather than
> hidden in the DTD (as it is, optionally, with architectural forms).
As my paper pointed out, we now live in a universe where the person
creating the DTD is often the author. You live in a world where people
pay you to hide things in DTDs. Most of the people on the Web don't have
a David Megginson or a Paul Prescod to do that for them. Their problems
are still real.
> 2) Multiple inheritance is not possible (X can be a kind of Y or a
> kind of Z, but not both).
Many people do not want multiple inheritance and as my paper pointed
out, it makes some problems much more difficult to understand and solve.
> 3) Standard DTD-based validation is not possible, and it is more
> difficult to create DTD-driven authoring tools.
I think you are totally wrong here. As a programmer, I could implement
modules in an SGML editor in MUCH less time than it would take me to
implement architectural forms.
> 4) Both architectural forms and sub-documents can be fully supported
> under the existing spec by _both_ validating and non-validating XML
> parsers: no changes necessary. Furthermore, they will also remain
> compatible with SGML tools.
That's great for today. But for tomorrow, ISO has already undertaken to
change SGML. Do you propose that they should not add anything to SGML
that is not compatible with existing tools? My position is that the very
point of a revision is to make things easier and more powerful and that
this is thus the perfect opportunity to make this common problem easier
to solve, even if it breaks some old tools.
> Why are people worried about writing specs to solve a problem that
> already has good, working, available solutions?
Because the good, working solutions are solutions to much harder
problems and make simple jobs needlessly difficult.
Paul "SIMPLE THINGS SIMPLE" Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Feb 4 18:17:33 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
References: <199802041632.LAA14809@geode.ora.com>
Message-ID: <34D8B09A.BBE21DA9@technologist.com>
Chris Maden wrote:
>
> [David Megginson]
> > "namespace:gi" element type names are unsuitable for several reasons:
>
> [...]
>
> > Why are people worried about writing specs to solve a problem that
> > already has good, working, available solutions?
>
> The problem (as I see it) is not one of including pieces of existing
> documents, nor of structural validation. The main reason for
> namespaces is semantic inheritance.
Architectural forms give you that.
> I want to write a scientific research paper quickly.
The key word here is *quickly*. Architectural forms don't give you that.
> It should be *possible* to create a DTD to which such a document
> complies, but I am not as interested in automatic validation of a
> namespace document. The interrelational issues are, I think, too
> complex to solve; in the example above, I would need to change the
> text-containing HTML elements' content models to include chemical and
> mathematical markup, and maybe allow HTML markup in MathML theorems.
> Pushing selected information into the content models is too ugly.
These issues are not complex at all.
They are all handled nicely by the Japanese proposal. In a "modular
world", HTML would become a module that takes parameters such as
"object-types", "character span types", "block types" and so forth. You
pass in "MathML::Formula" as an "object-type" and the HTML %figure-type;
entity gets updated to reflect it. The issue is only complex in the
example you site because HTML was not designed to be modular because
SGML does not have a concept of DTD modules.
Even so, this is already dirt-common in SGML applications that don't
even *have* modules. You define a parameter entity and include the
entity.
"
>
>
> ]>
>
> As with parameter passing, scoping declarations, if desirable, will be desirable
> with or without modules.
After thinking this through, I am a little disturbed by the proposal
above. To me, it implies a deep-ish changes to the SGML processing model
that a module/namespace proposal does not. Consider that in a
module/namespace proposal, every element type has a single, fully
qualified name. Unqualified references are merely "short form
references" (not to be confused with "short references") -- they are a
short form for the full thing. Going from an unqualified instance to a
fully-qualified one is a purely syntactic operation.
But I'm not sure how I would refer to elements in the scheme above.
Let's say I am writing a stylesheet. How do I differentiate betwen
[1]"FOO"s with "BAR" parentage and [2]elements conforming to the element
type "FOO" that can only exist in "BAR".
[1]
...
Here all FOOs refer to the same element type.
[2]
]>
...
Here all FOOs refer to different element types.
To me, there is a subtle but important difference. A scoped namespaces
proposal makes SGML (more) context dependent at the *syntactic* level,
but a scoped declarations proposal makes it context dependent at the
*semantic* level. There exists no "context free" expansion. I don't yet
know if this will cause Bad Side Effects. But right now I can't yet
imagine many uses for this feature *other than* the kind of element type
namespace scoping that could be accomplished completely in a modules
proposal.
If there are no other important uses for this feature then I would
rather stick with the more strictly syntactic module structure and leave
this contextual declaration stuff out. But maybe there are important
uses for this that I have not considered.
Note that I can *totally* imagine why you would want to scope an entity
declaration or notation declaration to an element, but not to an element
type. I think that the former should be a high priority, but don't
really understand the need for the latter.
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 4 22:52:40 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <34D8AE13.610ABC07@technologist.com>
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com>
Message-ID: <199802042253.RAA00485@unready.microstar.com>
Paul Prescod writes:
> *MAKE EASY THINGS EASY*
>
> Making my five-line formula into a different document with a different
> document type is *not easy*. It is a royal pain in the butt, which is
> why almost nobody does it. I have seen the CALS table model merged with
> dozens of DTDs and have never once seen someone take the opposite
> approach of making CALS tables "subdocuments."
You have stated a good, general rule of thumb; in this case, however,
it is important to remember that a central component of simplicity is
consistency (by the way, I _have_ seen CALS tables as SGML
subdocuments, but one of my dreams in XML is never to hear the words
"CALS table model" again).
XML documents may (and perhaps, usually will) contain non-XML objects
such as wordprocessor documents, spreadsheets, MPEG clips, Java
applets, audio sequences, and many others -- to date, thankfully, no
one has proposed uuencoding any these and dumping them inline between
a start and and tag.
Why should we treat an equation marked up in XML differently than an
equation marked up in Microsoft Word? It seems easier (from a user's
perspective) to treat everything as objects, rather than defining one
special case. Object-oriented programming has proven the value of
encapsulation, and the compound-document idiom is standard on millions
of desktops already, so we can hardly argue that subdocuments are an
unfamiliar approach.
I am a big fan of pragmatism on the implementation side, as people
might have noticed from my postings on the design of AElfred; on the
standards side, though, I wouldn't want to cripple a spec just to work
around a temporary problem that will have to be solved anyway for
non-XML objects. SGML people will remember unfortunate features like
SHORTREF, DATATAG, and OMITTAG -- included a little over a decade ago,
likewise, for the sake of making things easy and working around
temporary deficiencies in the available tools. XML is popular mainly
because it has finally banned all of these.
> Subdocuments have many problems including
> * typing convenience (seperate files...yuck)
(See comments above).
> * element type constrainability (how do I specify a SUBDOC root element
> type in a content model?)
Use HyTime (just joking). Seriously, I cannot see that this is a
worse case than not being able to use a DTD at all. The general idea
of compound documents (Netscape with plug-ins, OLE documents, Andrew
documents, or otherwise) is that you can plug in any object -- I had
imagined that this was the goal of namespaces as well. In XML you can
constrain the placement of pointers to external objects, at least.
> * "content model communication" (how do I pass a %cell; content model
> into my table subdoc)
You're thinking of CALS here. I'd suggest that we move away from the
older SGML model of heavily parameterised DTDs (as from heavily
#IFDEF'ed C header files): remember that one of the arguments for the
namespace model is to reuse stylesheets and other processing
specifications -- if a table model can vary its content unpredictably,
then you will not be able to reuse stylesheets anyway. Again,
encapsulation is a big win, and it keeps things easy.
That said, if you _really_ need to pass a %cell; content model to a
subdocument, you can always include the same file of entity
declarations in both the parent and the child. I'd recommend against
it, but it's possible if you want to do it.
> * modularity (subdocs must be declared at the top of the document, an
> annoying non-local maintenance issue)
Only if you use an entity/notation mechanism. You could just as
easily use a URL/MIME approach:
The question of how to include external objects is a separate debate,
and subdocuments can swing easily from either vine.
> * ID linkage (even for simple links I must use some more advanced
> linking strategy)
HREFs would work fine -- HTML people are already used to
so we should have no confusion here. Furthermore, you have the
advantage that your document's validity does not depend on its child
objects (this is very important for document management in large,
multi-author systems -- if subdocuments are atomic, then a change by
one author to a table, for example, will not make the containing
chapter invalid). Again, as in programming, encapsulation will be a
big win in the medium term.
> * semantics (i.e. SUBDOC has none...you need VALUEREF or something else
> on top of subdoc)
I expect that XLL will provide mechanisms for expressing the 'embed'
semantic.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 4 22:57:22 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <01bd318f$59cf8ae0$LocalHost@sgml>
References: <01bd318f$59cf8ae0$LocalHost@sgml>
Message-ID: <199802042257.RAA00504@unready.microstar.com>
Martin Bryan writes:
> Unfortunately subdocs are not supported in XML, or in many SGML
> tools.
Sorry for any confusion here -- I'm talking about subdocuments in
general, not about the SGML SUBDOC feature. You can include a
subdocument using an NDATA entity, or simply by providing a URI in an
attribute value. I'm certain that XLL will have something useful to
say here.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Thu Feb 5 00:16:44 1998
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <98Feb4.175315est.18819@thicket.arbortext.com>
References: <34D8AE13.610ABC07@technologist.com>
<34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com>
Message-ID: <3.0.5.32.19980204191534.009a4bc0@village.doctools.com>
This exchange is fascinating. One comment:
At 05:53 PM 2/4/98 -0500, David Megginson wrote:
>Paul Prescod writes:
> > * "content model communication" (how do I pass a %cell; content model
> > into my table subdoc)
>
>You're thinking of CALS here. I'd suggest that we move away from the
>older SGML model of heavily parameterised DTDs (as from heavily
>#IFDEF'ed C header files): remember that one of the arguments for the
>namespace model is to reuse stylesheets and other processing
>specifications -- if a table model can vary its content unpredictably,
>then you will not be able to reuse stylesheets anyway. Again,
>encapsulation is a big win, and it keeps things easy.
I don't think the problem has anything to do with CALS. In fact, until
SGML Open came along, it was pretty hard to use the CALS table model as a
module -- it was not designed with this use in mind, and its inflexibility
resulted in dozens or hundreds of DTDs recoding the whole thing just to
change a few features.
Table models, even if they're not CALS, are going to vary their content
unpredictably, because cells typically need to contain markup *inside* them
that is specific to the information domain *outside* the table structure;
they're surrounded coming and going. (As an aside, I don't think this
means you can't reuse stylesheets; you just sequester the table geometry
stuff from the cell formatting and recode just a little bit of
element-in-context stylesheet code.)
Table cells are a common boundary case of namespace mixing from the text
world, and perhaps there are similar situations in the data world. I think
that a black-box approach (subdocuments) would require way more overhead
than a unified-model approach in doing "content model communication."
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Feb 5 00:18:04 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com> <199802042253.RAA00485@unready.microstar.com>
Message-ID: <34D8FD5C.7445D1AE@technologist.com>
David Megginson wrote:
>
> XML documents may (and perhaps, usually will) contain non-XML objects
> such as wordprocessor documents, spreadsheets, MPEG clips, Java
> applets, audio sequences, and many others -- to date, thankfully, no
> one has proposed uuencoding any these and dumping them inline between
> a start and and tag.
Maybe not on this mailing list, but come on over to "SGML-TOOLS"
(formerly LinuxDoc). :) :)
> Why should we treat an equation marked up in XML differently than an
> equation marked up in Microsoft Word? It seems easier (from a user's
> perspective) to treat everything as objects, rather than defining one
> special case.
We should treat them differently for two reasons:
#1. XML data is text, and thus makes a certain amount of "sense" inline.
If I embedded LaTeX in an XML document I would probably inline it,
rather than refer to it for the same reason. Word formuale are binary.
#2. XML has concepts such as validation and id-reference that depend on
data being logically inline.
#3. If we do not do this, I do not think that people will use subdocs.
They will probably just abandon validation or use XML-Data.
> Object-oriented programming has proven the value of
> encapsulation, and the compound-document idiom is standard on millions
> of desktops already, so we can hardly argue that subdocuments are an
> unfamiliar approach.
Not so. Word does not use externally embedded data by default. If you
create a table, formula or a graphic, it is inlined by default.
Typically you only externally link to a file if it already exists (e.g.
it has some meaning independent of this document). I think Microsoft
made the right choice there.
> I am a big fan of pragmatism on the implementation side, as people
> might have noticed from my postings on the design of AElfred; on the
> standards side, though, I wouldn't want to cripple a spec just to work
> around a temporary problem that will have to be solved anyway for
> non-XML objects.
SGML is 12 years old. We are only marginally closer to having decent
tools that will manage this stuff for us. I personally have no faith
that they will arrive soon. I also think that we have 10 years of good
experience with what we need to guide our choices. Most major DTDs
incorporate ad hoc DTD modularity features. We know what they need to
make these features robust -- just namespace protection.
> SGML people will remember unfortunate features like
> SHORTREF, DATATAG, and OMITTAG -- included a little over a decade ago,
> likewise, for the sake of making things easy and working around
> temporary deficiencies in the available tools.
Well, I still use two of those three features, so obviously the problems
with the tools have not sufficiently cleared up yet. It also isn't clear
to me if those features have helped or hurt SGML's propularity. OMITTAG
in particular is very widely used. Even HTML uses it.
> > * element type constrainability (how do I specify a SUBDOC root element
> > type in a content model?)
>
> Use HyTime (just joking). Seriously, I cannot see that this is a
> worse case than not being able to use a DTD at all.
It isn't. But in XML we do have DTDs and we want to use them for these
heterogenous (not "compound") document.
> The general idea
> of compound documents (Netscape with plug-ins, OLE documents, Andrew
> documents, or otherwise) is that you can plug in any object -- I had
> imagined that this was the goal of namespaces as well.
I don't think so. In my paper I quoted from the XML Namespaces spec:
"We envision applications of XML in which a document instance may
contain markup defined in multiple schemas. These schemas may have been
authored independently. One motivation for this is that writing good
schemas is hard, so it is beneficial to reuse parts from existing,
well-designed schemas. Another is the advantage of allowing search
engines or other tools to operate over a range of documents that vary in
many respects but use common names for common element types. "
The goal of combining schemas is central to the concept.
> In XML you can
> constrain the placement of pointers to external objects, at least.
Cold comfort. :)
> > * "content model communication" (how do I pass a %cell; content model
> > into my table subdoc)
>
> You're thinking of CALS here. I'd suggest that we move away from the
> older SGML model of heavily parameterised DTDs (as from heavily
> #IFDEF'ed C header files): remember that one of the arguments for the
> namespace model is to reuse stylesheets and other processing
> specifications -- if a table model can vary its content unpredictably,
> then you will not be able to reuse stylesheets anyway.
The formatting for the contents of table cells and for the shape of the
table can be specified independently. In HTML, (for example) essentially
anything can go in a table cell. The table formatter just figures it
out. A good stylesheet language will provide quite a bit of independence
between construction rules. Yes, we may need some conventions for more
complex combinations (e.g. metadata formatting conventions), but most
things will "just work."
> > * ID linkage (even for simple links I must use some more advanced
> > linking strategy)
>
> HREFs would work fine -- HTML people are already used to
>
>
>
> so we should have no confusion here.
> > * semantics (i.e. SUBDOC has none...you need VALUEREF or something else
> > on top of subdoc)
>
> I expect that XLL will provide mechanisms for expressing the 'embed'
> semantic.
Both of these proposals just add hassles to something that should be
simple.
> Furthermore, you have the
> advantage that your document's validity does not depend on its child
> objects (this is very important for document management in large,
> multi-author systems -- if subdocuments are atomic, then a change by
> one author to a table, for example, will not make the containing
> chapter invalid). Again, as in programming, encapsulation will be a
> big win in the medium term.
Yes, there are occasions where this encapsulation is important and
useful. There are also times where it is not.
Let me put it this way: do you feel that the creators of DocBook, TEI
and HTML were mistaken by including table models rather than forcing
their users to use subdocs? If yes, then you have a very different idea
of usable DTD design than I do. If no, then I cannot understand why you
are opposed to making this process of including table models easier so
that you do not need people with brains the size of planets and a
serious commitment to DTD use to accomplish it.
All I am asking is to make this common DTD fragment combination idiom
simpler, more standard and more robust so that casual (and expert!)
users can whip up their own DTDs by combining fragments instead of
manually merging fragments, disambiguating names, adding architectural
forms etc. etc.
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Feb 5 00:33:35 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
References: <34D8AE13.610ABC07@technologist.com>
<34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com> <3.0.5.32.19980204191534.009a4bc0@village.doctools.com>
Message-ID: <34D90941.4777966F@technologist.com>
Eve L. Maler wrote:
>
> Table models, even if they're not CALS, are going to vary their content
> unpredictably, because cells typically need to contain markup *inside* them
> that is specific to the information domain *outside* the table structure;
> they're surrounded coming and going.
There are many other situations where we have the same problem, but just
don't recognize it. Think about lists, bibliographies, cross references
and so forth. We shouldn't have to reinvent these for each DTD. There
are probably a short list of interesting parameterizations on them (for
most apps) and we should just include and use them (after specifying the
relevant parameterization options). Nobody has tried this (much) in the
past because module usage in SGML is just too painful. So only CALS
tables and a few other constructs are complex enough that the pain
involved in reinventing them outweighs the pain involved in using them
from a module. But if we massively reduce the pain in reusing element
declarations, we will probably see people reusing them a lot more.
That means that we need a convenient parameterization syntax and
namespace managment. Actual DTD fragment management would also be very
useful. Perhaps the Web can start to serve that role (for those that
can't afford full databases).
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Feb 5 02:32:56 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To: <34D8FD5C.7445D1AE@technologist.com>
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com>
<199802042253.RAA00485@unready.microstar.com>
<34D8FD5C.7445D1AE@technologist.com>
Message-ID: <199802050233.VAA00341@unready.microstar.com>
Paul Prescod writes:
> Not so. Word does not use externally embedded data by default. If
> you create a table, formula or a graphic, it is inlined by default.
> Typically you only externally link to a file if it already exists
> (e.g. it has some meaning independent of this document). I think
> Microsoft made the right choice there.
Here, perhaps, there is some miscommunication between us. As I
understand it (and I am by no means a Microsoft guru, or even a
regular user, so please read this with appropriate caution), all Word
documents are actually OLE compound objects -- in other words, they
consist of (possibly many) separate objects stored in the same
physical disk file; a simpler example of the same thing is Java's JAR
files.
For XML to work on the desktop rather than just on the server, it will
also need some kind of packaging standard -- a way for all of the
entities (XML and non-XML) that make up a document to be edited,
stored, and shipped together, but easily broken apart again when
necessary. I'm suggesting that once such a standard exists, and once
there are tools to use it, including subdocuments in XML will be as
easy as (and hopefully, much less buggy than) including Excel
spreadsheets in Word documents.
> Let me put it this way: do you feel that the creators of DocBook,
> TEI and HTML were mistaken by including table models rather than
> forcing their users to use subdocs?
Of course not. Different DTDs will include different levels of base
markup, depending on their areas of application -- we're dealing only
with the case when people want to use structures not defined in the
DTD itself.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Feb 5 03:49:30 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, Architectural Forms, and Sub-Documents
References: <34D82C2E.6C6B3AE7@technologist.com>
<34ee5fa9.103270755@mail.alink.net>
<34D87D02.14BA4B9C@technologist.com>
<199802041506.KAA00956@unready.microstar.com>
<34D8AE13.610ABC07@technologist.com>
<199802042253.RAA00485@unready.microstar.com>
<34D8FD5C.7445D1AE@technologist.com> <199802050233.VAA00341@unready.microstar.com>
Message-ID: <34D934FA.742DD860@technologist.com>
David Megginson wrote:
>
> For XML to work on the desktop rather than just on the server, it will
> also need some kind of packaging standard -- a way for all of the
> entities (XML and non-XML) that make up a document to be edited,
> stored, and shipped together, but easily broken apart again when
> necessary. I'm suggesting that once such a standard exists, and once
> there are tools to use it, including subdocuments in XML will be as
> easy as (and hopefully, much less buggy than) including Excel
> spreadsheets in Word documents.
It is only easy to do this with Word because Word manages it for you. I
don't intend to change to a dedicated XML editor, do you?
> > Let me put it this way: do you feel that the creators of DocBook,
> > TEI and HTML were mistaken by including table models rather than
> > forcing their users to use subdocs?
>
> Of course not. Different DTDs will include different levels of base
> markup, depending on their areas of application -- we're dealing only
> with the case when people want to use structures not defined in the
> DTD itself.
No, the question is *how do we construct DTDs*? Let me try that quote
again:
"We envision applications of XML in which a document instance may
contain markup defined in multiple schemas. These schemas may have been
authored independently. One motivation for this is that writing good
schemas is hard, so it is beneficial to reuse parts from existing,
well-designed schemas. Another is the advantage of allowing search
engines or other tools to operate over a range of documents that vary in
many respects but use common names for common element types. "
Let me emphasize: "writing schemas is hard, so it is beneficial to reuse
parts from existing schemas." The goal is thus to construct DTDs from
smaller ones. (e.g. HTML + CALS + MATHML or TEILITE + JAVA + XLL or ...)
Paul Prescod
--
http://itrc.uwaterloo.ca/~papresco
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Thu Feb 5 09:34:37 1998
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun 7 17:00:05 2004
Subject: FORTRAN namelist input - remember? Replace with XML!
In-Reply-To: "Glenn R. Kronschnabl"'s message of Wed, 04 Feb 1998 09:56:58 -0600
References: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu>
Message-ID:
Our XML tool suite provides an API for this for XML directly, without using
SP. Our NSL tool suite does the same for full SGML, using SP.
http://www.ltg.ed.ac.uk/software/xml/ and .../nsl/
ht
--
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From serres-doug at usa.net Thu Feb 5 11:56:51 1998
From: serres-doug at usa.net (Doug Serres)
Date: Mon Jun 7 17:00:05 2004
Subject: recommendations on currently available streaming XML
toolkits?
References: <3.0.32.19980204104017.00aad5dc@pop.intergate.bc.ca>
Message-ID: <34D9A901.A472FCD3@usa.net>
Tim Bray wrote:
> At 11:02 AM 03/02/98 -0500, Navdip Bhachech wrote:
> >there have been a few discussions on streaming issues in this list
> >lately, so I thought I'd ask:
> >What are the recommended toolkits (currently available) that allow
> >streaming XML, instead of a file based approach?
>
> Lark (http://www.textuality.com/Lark/) is happy to read a stream.
> But as others have pointed out, relative URLs can be a real
> problem. -Tim
>
I'm using MSXML (http://www.microsoft.com/xml/) for streaming too.
--
Doug Serres
Junior Developer - R&D
Andyne Computing Ltd.
e-mail: dserres@andyne.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gmckenzi at JetForm.com Thu Feb 5 13:47:06 1998
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 17:00:05 2004
Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents
Message-ID:
David Megginson wrote:
> [snip]
> XML documents may (and perhaps, usually will) contain non-XML objects
> such as wordprocessor documents, spreadsheets, MPEG clips, Java
> applets, audio sequences, and many others -- to date, thankfully, no
> one has proposed uuencoding any these and dumping them inline between
> a start and and tag.
> [snip]
Am I to understand from this paragraph that there would be something
wrong with uuencoded or base64'd resources, like audio clips or even a
Java class, between a start and end tag?
I thought this would be a given. Sure using XLL or simple url hrefs are
great, but many times the requirement is for a single file with all
resources literally included.
This is similar conceptually to the intent of MIME, and MHTML, and OLE
(at one time the E meant something -- embedding). Syntactically MIME
derived methods aren't nearly as nice as stuffing the resource between a
start and end tag.
Take a look at the Internet Open Trading Protocol
http://www.otp.org:8080/ It does this all over the place.
A packaging standard to encapsulate all of the resources in the same
file is nice, but why isn't legitimate to place them all inline?
Gavin.
========================================================
Gavin F. McKenzie Vox:+1(613)230-3676 ext 5277
JetForm Corporation Fax:+1(613)594-8886
http://www.jetform.com mailto:gmckenzi@jetform.com
========================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Feb 5 13:50:11 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:05 2004
Subject: Namespaces, etc.
In-Reply-To: <34D82C2E.6C6B3AE7@technologist.com>
Message-ID: <3.0.1.16.19980205102838.2e4712de@pop3.demon.co.uk>
At 03:51 04/02/98 -0500, [many people] wrote [about namespaces,
architectures, etc.]:
I don't want to stifle discussion on XML-DEV, but suggest some guidelines:
1. There is a public draft of the Namespaces paper now, I believe. [Could
someone please confirm this and give the location - I wouldn't like to
refer to a private document]. My understanding is that the W3C is actively
working on namespaces. For this reason I think it is appropriate that
proposals for other ways of developing namespaces (especially those which
require new syntax or semantics) be referred to the appropriate W3C body.
If you aren't a member, but have something to propose I would hope that
chairs will be sympathetic if you mail them.
A major problem with discussing current W3C activity on this list is that
most members/readers do not have up-to-date knowledge of the current W3C
discussions. This can make for confusion, and it would break
confidentiality for a W3C member to say "hang on, we are going down a
different line". The most reasonable thing to do is to discuss the last
public draft of a spec (especially its implementation or experience of
implementation :-) but NOT, IMO, to make suggestions for its revision.
2. I suggest that discussion is limited to *implementing* or *exploring*
the Namespace proposal. The XML spec refers (I think) to "namespace
experiments" and I think that this is the approach we should take - i.e.
discuss experiments with *this* namespace proposal.
My own approach has been:
- to create a private namespace experiment
- to approach WG members to see if it broke confidentiality
- to wait until the spec was public
- to distribute it, and a short explanatory note, with the current JUMBO
release. (9801a1)
So, rather than discuss my very simple namespace experiment on this list
(since it has many demerits and will almost certainly be broken by future
namespace developments) you can get it and read it with the distribution.
Its sole merits are that it is actually implemented, works and does
something useful for my applications. If others see it as a way forward I'd
be interested. I hope to release JUMBO-PLAY shortly and this will
optionally use the namespace proposal.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Feb 5 13:56:20 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:05 2004
Subject: FORTRAN namelist input - remember? Replace with XML!
In-Reply-To: <199802041559.JAA06936@mail-firewall.arlut.utexas.edu>
Message-ID: <3.0.1.16.19980205095841.2e471f34@pop3.demon.co.uk>
At 09:56 04/02/98 -0600, Glenn R. Kronschnabl wrote:
>I want to use XML as a general input mechanism for scientific programs. In
Great idea! XML revolutionises program input and output. FORTRAN
programmers spend half their life with:
Column 61 (I2) the number of optional cards describing the FOO.
This is an optional branch of a tree. With TEI processing it's marvellous.
I am trying to convert the molecular community to use XML as standard for
input and output to *existing* programs. If you can achieve it in your
community - great.
>the old days, say in FORTRAN, one used to use namelist input. In C/C++, one
>usually wrote a custom driver. I want to use XML because it appears to make
>sense. I have started using SP - and want to build a tree that I can query
>(kind of like an xrdb interface) for my input parameters. But, before I
>embark on this, I was wondering if 1) this makes sense, 2) someone surely
has
>a simple tree builder/query interface to SP already that I can use so I
don't
>have to write my own (none jumped out at me when I looked around).
I imagine the simplest way to do this is to write an XML2F77input
processor. This is really a stylesheet application. If you wait for XSL I
suspect it will solve many of your problems. If you can't wait, then there
may be facilities in JUMBO that could be useful.
P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Feb 5 14:07:30 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:05 2004
Subject: LISTRIVIA: (was Re: Namespaces, modules and architectures
paper available)
In-Reply-To: <34ee5fa9.103270755@mail.alink.net>
References: <34D82C2E.6C6B3AE7@technologist.com>
<34D82C2E.6C6B3AE7@technologist.com>
Message-ID: <3.0.1.16.19980205134243.2e470aec@pop3.demon.co.uk>
At 12:46 04/02/98 GMT, Charles F. Goldfarb wrote:
>As several postings have referred to module proposals that are being
considered
>for the SGML revision, I thought it might be helpful to post one here.
>--
>Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
> 13075 Paramount Court * Saratoga CA 95070 * USA
> International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
> Prentice-Hall Series Editor * CFG Series on Open Information Management
>--
>
>Attachment Converted: "c:\eudora\attach\module.htm"
Charles,
We try to dissuade people from attachments to XML-DEV postings because:
- some people cannot read them
- they do not appear in the hypermailed version
- there is no permanent record.
- long attachments cost people (including me) money
- they cannot be quoted easily
Could you please repost. If it's short, please include it; if not please
give a URL.
TIA
P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Feb 5 14:09:49 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:05 2004
Subject: Foreign object inclusion WAS: Namespaces, Architectural Forms, and Sub-Documents
In-Reply-To:
References:
Message-ID: <199802051409.JAA00365@unready.microstar.com>
Gavin McKenzie writes:
>
> David Megginson wrote:
> > [snip]
> > XML documents may (and perhaps, usually will) contain non-XML objects
> > such as wordprocessor documents, spreadsheets, MPEG clips, Java
> > applets, audio sequences, and many others -- to date, thankfully, no
> > one has proposed uuencoding any these and dumping them inline between
> > a start and and tag.
> > [snip]
>
> Am I to understand from this paragraph that there would be
> something wrong with uuencoded or base64'd resources, like audio
> clips or even a Java class, between a start and end tag?
You are quite right that this is legal XML or SGML -- that's one valid
use of NOTATION attributes. Here's this paragraph UUENCODED:
]]>
That said, CDATA marked sections won't always work for you -- BLOBs
are likely to contain non-SGML characters, and any arbitrary non-XML
markup containing ']]>' will kill the marked section. The best way to
include arbitrary non-XML information in a document is to include it
as an unparsed entity or an HREF link (just as you would include a GIF
in an HTML page).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Feb 20 12:29:54 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:10 2004
Subject: rec.xml
In-Reply-To: <3.0.1.16.19980220095415.2247b80e@pop3.demon.co.uk>
References: <3.0.1.16.19980220095415.2247b80e@pop3.demon.co.uk>
Message-ID: <199802201228.HAA00784@unready.microstar.com>
Peter Murray-Rust writes:
> Using SAX (alone) to parse the XML version of the XML
> recommendation (rec.xml), is it possible to create a well-formed
> version? The first time I tried this the result surprised me.
James Clark has created the Java application XMLTest to do exactly
this:
http://www.jclark.com/xml/XMLTest.java
I just normalised the REC with the following command line:
java XMLTest com.microstar.sax.AElfredDriver /tmp
REC-xml-19980210.xml
It seems to have come out fine (though without XML declaration,
comments, DOCTYPE, etc.). The purpose of James's application is to
allow easy comparisons of different SAX drivers and parsers.
> BTW there may be problems parsing rec.xml as the official version
> contains a (single) character #160 ( ).
The problem has been fixed in the REC.
Parsing the REC no longer causes problems for AElfred because the
REC's XML declaration declares the encoding as "ISO-8859-1", where
#160 is a legal character. The problem is that not all XML parsers
allow the declared encoding ISO-8859-1 (though that's what most of
them really support).
> This has actually been 'commented out' but parsers such as AElfred
> don't accept it and throw an error. DavidM assures me that this is
> the correct thing to do - I take this on trust.
This is _a_ correct thing to do. This is an error but not a fatal
error, so it is up to the parser whether or not to report it. That
said, any parser with actual UTF-8 support will somehow choke on #160
if it thinks it's parsing UTF-8. Right now, most parsers claim to be
parsing UTF-8 when they're really parsing ISO-8859-1, hence they don't
choke on #160.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From matthewg at poet.de Fri Feb 20 13:12:54 1998
From: matthewg at poet.de (Matthew Gertner)
Date: Mon Jun 7 17:00:10 2004
Subject: Automating Search Interfaces"
Message-ID: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com>
Don,
> Also standard DTDs can not adapt to change. What do you do when the
> standard DTD for electronic devices must be changed to include performance
> data (i.e. WinMark for Intel machines)? The problems are simply
> mindboggling (well, my mind is easy to boggle).
One approach that really appeals to me is based on a two-pronged effort to
create standard tags *and* standard DTDs, and relies on the fact that there
is really a working mechanism for extending DTDs through inheritance (which
I guess is still not entirely the case).
Standard tags would be a bit of a hack, but probably very useful in a
pragmatic sense. For example, you might be able to say certain things about
a TITLE tag, or a PRICE tag, or whatever, just on the basis of the name,
regardless of the actual DTD being used. If these conventions were
well-known, this could be of great use when defining a new DTD (i.e. "Let's
call the tag PARAGRAPH and not PARA because this is what will be recognized
by search engines").
Inheritance is *not* a hack and really seems like the way to go for more
ambitious implementations. To take your example, the DTD for electronic
devices might contain tags for VENDOR, PRODUCTNAME, PRICE, CATEGORY, etc. If
I want to find all CD player devices from Sony that cost less than $99 then
I can query based on this standard DTD. Vendors who want to include more
information just derive a new DTD with all the standard tags, as well as
vendor-specific ones (for benchmark figures, for example). The non-standard
tags may not be available for querying, but the information in the
standardized base DTD would be.
This becomes even more powerful with multiple inheritance. I can whip up a
DTD for my new portable XML viewer/expresso brewer, imported from
Kazakhstan, just by grapping the standard DTDs for hand-held electronic
devices (derived from general electronic devices but adding tags for SIZE,
WEIGHT and BATTERYLIFE), for food processing equipment (also derived from
electronic devices but a tag for FOODTYPE) and for imported goods (with tags
for COUNTRYOFORIGIN, EXPORTTARIF, etc.). This would let users find my
product by querying for all portable devices weighing under 200 grams which
can process coffee and which are produced in Central Asia.
I really believe the world needs XML to get a grip on information explosion.
The approach suggested by the original poster is great, and with
plug-and-play DTDs I don't see any real technical reason why it shouldn't
work. As an initial implementation, the approach based on GI only would no
doubt be a good workaround.
Matthew
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Fri Feb 20 16:03:37 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:10 2004
Subject: Is anyone using CDATA?
Message-ID: <01bd3e19$4288ad80$1e09e391@mhklaptop.bra01.icl.co.uk>
>Anyone have experiences with CDATA ? We're interested in inserting
>non-XML markup and BLOBs into XML files, and the best way seems to be
>CDATA.
I don't think CDATA is useful for inserting binary data into XML files,
because there is no way of escaping the terminating "]]>". I think the best
way to do it, if you want to do it inline, is to use Base64 encoding, and
then
you don't need CDATA.
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Feb 20 19:44:44 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:10 2004
Subject: Automating Search Interfaces"
In-Reply-To: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com>
References: <01bd3e01$168b28b0$a00b0ac0@pharcyde.poetsoftware.xo.com>
Message-ID: <199802201535.KAA00874@unready.microstar.com>
Matthew Gertner writes:
> One approach that really appeals to me is based on a two-pronged effort to
> create standard tags *and* standard DTDs, and relies on the fact that there
> is really a working mechanism for extending DTDs through inheritance (which
> I guess is still not entirely the case).
>
> Standard tags would be a bit of a hack, but probably very useful in a
> pragmatic sense. For example, you might be able to say certain things about
> a TITLE tag, or a PRICE tag, or whatever, just on the basis of the name,
> regardless of the actual DTD being used. If these conventions were
> well-known, this could be of great use when defining a new DTD (i.e. "Let's
> call the tag PARAGRAPH and not PARA because this is what will be recognized
> by search engines").
The idea is actually quite sound, but the implementation could be a
little cleaner. Instead of relying on the element type name (which
may vary for different domains of information), why not have a
standard attribute (such as 'standard-doc') that gives the equivalent
standard name in the architecture. That way, just as you write
public class Cost implements Price {
}
in Java, you can write
in XML, or even
xxx
This makes multiple inheritance easy:
Now, that `cost' inherits from `price' in the standard-doc
architecture and from `value' in the alt-doc architecture.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mwagner at ets.org Fri Feb 20 20:33:47 1998
From: mwagner at ets.org (Mike Wagner)
Date: Mon Jun 7 17:00:10 2004
Subject: MS XML Parser on the Server
Message-ID:
Has anybody managed to get the Microsoft Java XML Parser running as a
component accessible by ASP under IIS? I tried what seemed to me to be the
obvious approach and that didn't work. I copied the java classes to the
TrustLib directory, then registered them with javareg. (An excerpt of the
BAT I used file is at the end of this message). However, when I try a
simple Server.CreateObject("com.ms.xml.om.Document") call in an ASP page,
it dies with the following error:
Microsoft JScript runtime error '800a01ad'
Automation server can't create object
/xmltest.asp, line 14
Any insights? Thanks.
Mike Wagner
Educational Testing Service
mwagner@ets.org
-----------------Javareg BAT file--------------------
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:SchemaNode /progid:com.ms.xml.dso.SchemaNode
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLParserThread
/progid:com.ms.xml.dso.XMLParserThread
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLRowsetProvider
/progid:com.ms.xml.dso.XMLRowsetProvider
cd \winnt\java\trustlib\com\ms\xml\om
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From pierlou at CAM.ORG Fri Feb 20 21:04:15 1998
From: pierlou at CAM.ORG (Pierre Morel)
Date: Mon Jun 7 17:00:10 2004
Subject: Automating Search Interfaces
Message-ID: <01bd3e41$861d1090$02dcdcdc@pierre>
Hello,
I would like to talk about the location of the person making the search versus the location of the product or service provider. If I search for a product and I want it now, I only want a list of provider in a distance applicable for my request. And if I go to Europe this summer and want to make reservation or search for activities occuring at that time, the 'where I am' specification change. If I have a secondary house and make request on the week-end, I want the restaurant in that region and not the one near my primary house. An identity profile should be include in the query and give the chance to the search engine to make a better choice in regard of my age, sex, etc...
Another part of the problem is a unique number identification and I am not sure if EAN or SIC is good for that purpose. How a search engine can parse a site or made a request for a product or service without a unique product number. A hotel room is a 'chambre' in french. If I search for a hotel room in Italy, I don't know the word for room in italian but if a room is a number, I can search for a room every where in the world. The query interface will be in my language and the service provider will build his database in his own language. The query page should change for every product. I have work around this idea for a time and came to the conclusion that a lightweight page creation and manipulation is need. The small tutorial that show how the parts fit together is related to a very premature search engine. The left pane show the products in a store but can be a list of products at a search engine site.
What is XML-Data versus DTD ? Maybe the solution is there and I don't see it.
I would like to know if every product on earth can have a number the same way that every book can be codified ?
Best regards to all
Pierre Morel
pierlou@cam.org
http://www.cam.org/~pierlou/prototype
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980220/147dbc11/attachment.htm
From donpark at quake.net Fri Feb 20 22:39:36 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:10 2004
Subject: TagNet (was Automating Search Interfaces)
Message-ID: <009301bd3e4f$b4e336d0$2ee044c6@donpark>
Matthew,
>One approach that really appeals to me is based on a two-pronged effort to
>create standard tags *and* standard DTDs, and relies on the fact that there
>is really a working mechanism for extending DTDs through inheritance (which
>I guess is still not entirely the case).
I think the efforts will be best spent by building a sort of WordNet like
service which allow automatic registration and association of tag and
attribute names. For example, book vendor could register TITLE as a tag
name and associate it with NAME as a synonym constrained by the book
industry code (if there is such a thing). Search service can then see that
the contents offered by the book vendor can be searched by mapping its NAME
field to TITLE tag. Inheritance relationship can also be registered and
taken advantage of by search services.
It probably won't have to be a full semantic network but it will require a
standard API. I wish it could capture whole/part relationships as well like
(NAME == FIRST + MIDDLE + LAST) but I could be going overboard here. Some
of the entries can be marked as the 'norm' by some standardization
organizations. A DTD writer could just build what he wants and then pass it
through the service to change all names to the 'norm'.
For the benefit of those replying to this message, let me call the service
TagNet.
"What do you want to tag today?;-)"
Feeling great today,
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Fri Feb 20 22:39:40 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:10 2004
Subject: Automating Search Interfaces
Message-ID: <009401bd3e4f$b5d7a8f0$2ee044c6@donpark>
Pierre,
>I would like to talk about the location of the person making the search versus the location of the product or service provider. If I search for a product and I want it now, I only want a list of provider in a distance applicable for my request. And if I go to Europe this summer and want to make reservation or search for activities occuring at that time, the 'where I am' specification change. If I have a secondary house and make request on the week-end, I want the restaurant in that region and not the one near my primary house. An identity profile should be include in the query and give the chance to the search engine to make a better choice in regard of my age, sex, etc...
Interesting. Some of the issues with product location are:
1. How to indicate location?
Address or map coordinates? How does one find map coordinates? What happens when he moves?
2. How to associate location with products?
If a vendor has all inventory at a single location then the location can be #FIXED in his DTD. If inventory is distributed around the globe, each product or inventory group will have to be marked. The problem is that now it makes no sense to indicate physical location. It will have to be a store code which causes problem with search services since store codes will have to be converted into location format used by the search service.
As far as time constraints go, each product will probably be marked with time. The problem is that some time constraints are relative in nature.
*Ouch* I just thought of another painful problem with prices. What happens when a store wants to put on a sale? His database of products will have to map to different pricing schemes constrained by time, location, or association.
All this hurts my head a bit but it is very interesting indeed...
Regards,
Don Park
http://www.quake.net/~donpark/index.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980220/5f733d95/attachment.htm
From mike at jmaca.com Fri Feb 20 23:01:53 1998
From: mike at jmaca.com (Michael Emmel)
Date: Mon Jun 7 17:00:10 2004
Subject: Binary Data
Message-ID: <34EE0D80.BBA53DC9@jmaca.com>
Is it possible to include binary data in a XML document and follow the
spec.
allows the inclusion of arbitrary ascii data except I do not think
uuencode or other binary -> ascii/UTF8
encoders will work without modification to eliminate the ]]> encoding.
Would this be possible.
where the parser would ignore
1024 bytes and expect
to see a ]]> at the end.
The spec seems to imply only character data but does not disallow
binary data.
I assume a character encoding that did not use the ]]> sequence is okay.
I think the tag is not.
You need let the the parser ignore and redirect x number of bytes from
the token stream. This would be equivalent to a "Java production" in
Javacc.
But I'm not sure if it is legal ???
So do I need to alter uuencode or some other encoding format to fit the
Message-ID: <3.0.1.16.19980220224802.35d7dc7c@pop3.demon.co.uk>
At 14:33 20/02/98 -0800, Don Park wrote:
[...]
>
>Attachment Converted: "c:\eudora\attach\ReAutoma.htm"
^^^^^^^^^^^^
This is the sort of problem with attachments...
P.
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Feb 20 23:27:09 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:10 2004
Subject: LISTRIVIA
In-Reply-To: <01bd3e41$861d1090$02dcdcdc@pierre>
Message-ID: <3.0.1.16.19980220224826.35d7f436@pop3.demon.co.uk>
Hi Pierre, thanks for the posting...
At 15:52 20/02/98 -0500, Pierre Morel wrote:
>
>Attachment Converted: "c:\eudora\attach\Automati.htm"
^^^^^^^^^^
We ask people not to post attachments to xml-dev, because they don't get
hypermailed and they take up space on readers' machines :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Feb 20 23:39:55 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:10 2004
Subject: xml:space
Message-ID: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk>
I am considering how to treat xml:space in JUMBO and ask for help and
comments. I am NOT re-opening the whitespace debate; I am asking
those who understand xml:space if what I do/intend to do is reasonable.
xml:space is a formal part of the language and I feel I have to address
it.
1. Are there any documents which actually use xml:space? rec.xml does not
2. Is there anyone on this list intending to use it? If so, what do they
expect "applications' default white-space processing modes" to be?
[Quotations are from rec.xml]
>An XML processor must always pass all characters in a document that are
not >markup through to the application. A validating XML
>processor must also inform the application which of these characters
>constitute white space appearing in element content.
My philosophy in JUMBO (which is a generic application) is to accept all
whitespace from the parser/SAX, whether labelled 'ignorable' or not. All
PCDATA is stored in child nodes of elements. Those with ignorable
whitespace can be specially labelled. IOW I do not discard any character
data on input.
>
>A special attribute named xml:space may be attached to an element to
signal >an intention that in that element, white space should be
>preserved by applications. In valid documents, this attribute, like any
>other, must be declared if it is used. When declared, it must be
>given as an enumerated type whose only possible values are "default" and
>"preserve". For example:
>
>
>
OK. If xml:space="preserve" I have no problems.
If xml:space="default" I am asking for help. Note that xml:space="default"
could apply either to ignorable whitespace or non-ignorable w/s
If xml:space is absent, I suggest options below...
>
>The value "default" signals that applications' default white-space
>processing modes are acceptable for this element; the value
>"preserve" indicates the intent that applications preserve all the white
>space. This declared intent is considered to apply to all
>elements within the content of the element where it is specified, unless
This causes me slight concern. It means I have to write code that
automatically tracks what elements have an xml:space attribute. This is
possible, but yet another thing that has to be done. I might be motivated
to do it if I am shown some use for it...
>overriden with another instance of the xml:space attribute.
This means effectively that every node in a document has to have an
xml:space flag. [Unless this is dynamically worked out every time the
document is to be rendered.]
--------
Without xml:space, and without a DTD, I can see the following *generic*
possibilities:
- element is empty. [BTW the spec (and SAX) discards all knowledge of
whether this was created by or . I approve of this.].
Children are not displayed because there aren't any
- element contains non-w/s characters. This is displayed as either as a
string or as a title-value pair (at user option). The title is determined
by simple heuristics.
- element contains element content. This is displayed as a tree. I am
considering also allowing the user to display this as a tagged/untagged
event stream, but the tree is the default.
- element contains element content and (some) non-w/s PCDATA children .
This is displayed as an untagged (or selectable) tagged event stream.
Unless the semantics of the tags are known or a stylesheet is provided, no
other rendering is possible.
Now the two w/s options...
- element contains element content and (only) w/s children. This is
displayed by default as ignoring the w/s. Note that this is *display*, not
processing. Since the default is a tree, the w/s nodes aren't much use.
- element contains a single w/s child. This does not display anything by
default.
The user can switch to display/hide PCDATA children in the tree display.
For *outputting* it is possible to delete the w/s nodes if required. Once
deleted they are gone ...
I would be interested in comments as to whether this is reasonable default
behaviour or whether there are other things that should be considered.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Feb 21 00:24:39 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:00:10 2004
Subject: xml:space
Message-ID: <3.0.32.19980220162318.00acf700@pop.intergate.bc.ca>
At 10:34 PM 2/20/98, Peter Murray-Rust wrote:
A short answer: yes, if you want to respect xml:space, you have really
no choice but to keep a stack or suchlike to see if it's been overriden
in a child element. JUMBO, since it's an application, has no obligation
to respect xml:space, it's just a request, after all. If you are
respecting xml:space, whenever you are in an element for which
xml:space='preserve' does not apply, you should do whatever best suits the
needs of your application and its users. I very much doubt there is a
universal answer for all classes of application. I think HTML gets it
pretty much right for display type applications.
As for your question "will it be used?": yes, of course. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sat Feb 21 02:44:14 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
Message-ID: <002201bd3e72$c6880a00$9d0b4ccb@NT.JELLIFFE.COM.AU>
From: Michael Emmel
>Is it possible to include binary data in a XML document and follow the
>spec.
It is possible to have binary data in an XML *document* but it is not
possible
to have (unencoded) binary data in an XML text *entity*. A document is
constructed from entities. An entity is usually a file. An entity is either
text
or binary (NDATA) but not both.
You can use Base64 encoding to stick non-text data inside elements:
...
...
...
CDATA marked sections are only a shorthand mechanism for data which has a
lot of
"&" or "<" characters which you might find tedious to delimit into entity
references.
It is not a mechanism for embedding raw binary, per se.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From JimL at Alphag.net Sat Feb 21 23:39:03 1998
From: JimL at Alphag.net (Jim Lears)
Date: Mon Jun 7 17:00:11 2004
Subject: MS XML Parser on the Server
Message-ID:
Server.CreateObject in VBScript is used for creating instances of COM
objects. The Java XML Parser doesn't expose any COM interfaces...notably
IClassFactory which is used to instantiate COM objects. The C++ version
is what you need...its an ActiveX control. The source code for both
parsers is available. If you insist on using the Java version, you could
mod it up to sport a COM interface..
Helping To Destroy The English Language
-----Original Message-----
From: Mike Wagner [SMTP:mwagner@ets.org]
Sent: Friday, February 20, 1998 3:33 PM
To: xml-dev@ic.ac.uk
Subject: MS XML Parser on the Server
Has anybody managed to get the Microsoft Java XML Parser running
as a
component accessible by ASP under IIS? I tried what seemed to me
to be the
obvious approach and that didn't work. I copied the java classes
to the
TrustLib directory, then registered them with javareg. (An
excerpt of the
BAT I used file is at the end of this message). However, when I
try a
simple Server.CreateObject("com.ms.xml.om.Document") call in an
ASP page,
it dies with the following error:
Microsoft JScript runtime error '800a01ad'
Automation server can't create object
/xmltest.asp, line 14
Any insights? Thanks.
Mike Wagner
Educational Testing Service
mwagner@ets.org
-----------------Javareg BAT file--------------------
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:SchemaNode
/progid:com.ms.xml.dso.SchemaNode
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLParserThread
/progid:com.ms.xml.dso.XMLParserThread
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLRowsetProvider
/progid:com.ms.xml.dso.XMLRowsetProvider
cd \winnt\java\trustlib\com\ms\xml\om
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following
message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the
following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mike at datachannel.com Sun Feb 22 00:05:46 1998
From: mike at datachannel.com (Mike Dierken)
Date: Mon Jun 7 17:00:11 2004
Subject: MS XML Parser on the Server
Message-ID: <01BD3EE2.0FB98770@NEMO>
On the MS platform, you can expose all your Java classes and interfaces as COM interfaces if you use the ActiveX Wizard for Java (JAVAIDL.EXE). It'll create an .IDL file (and .C and .H files if you want to call the interfaces from C/C++).
All Java classes are exposed asl dual interfaces, derived from IDispatch, which allows them to be called from all COM aware scripting languages (JavaScript, VB for Automation, etc).
If the Java classes are registered with Javareg (using the CLSIDs from the generated .IDL file) on the server, you can use the package name rather than a CLSID.
To create a Java object, you might try prepending 'java:' on the package name.
Server.CreateObject("java:com.ms.xml.om.Document")
Hope this helps...
Mike D
DataChannel
-----Original Message-----
From: Jim Lears [SMTP:JimL@Alphag.net]
Sent: Saturday, February 21, 1998 3:36 PM
To: xml-dev@ic.ac.uk
Subject: RE: MS XML Parser on the Server
Server.CreateObject in VBScript is used for creating instances of COM
objects. The Java XML Parser doesn't expose any COM interfaces...notably
IClassFactory which is used to instantiate COM objects. The C++ version
is what you need...its an ActiveX control. The source code for both
parsers is available. If you insist on using the Java version, you could
mod it up to sport a COM interface..
Helping To Destroy The English Language
-----Original Message-----
From: Mike Wagner [SMTP:mwagner@ets.org]
Sent: Friday, February 20, 1998 3:33 PM
To: xml-dev@ic.ac.uk
Subject: MS XML Parser on the Server
Has anybody managed to get the Microsoft Java XML Parser running
as a
component accessible by ASP under IIS? I tried what seemed to me
to be the
obvious approach and that didn't work. I copied the java classes
to the
TrustLib directory, then registered them with javareg. (An
excerpt of the
BAT I used file is at the end of this message). However, when I
try a
simple Server.CreateObject("com.ms.xml.om.Document") call in an
ASP page,
it dies with the following error:
Microsoft JScript runtime error '800a01ad'
Automation server can't create object
/xmltest.asp, line 14
Any insights? Thanks.
Mike Wagner
Educational Testing Service
mwagner@ets.org
-----------------Javareg BAT file--------------------
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:SchemaNode
/progid:com.ms.xml.dso.SchemaNode
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLParserThread
/progid:com.ms.xml.dso.XMLParserThread
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLRowsetProvider
/progid:com.ms.xml.dso.XMLRowsetProvider
cd \winnt\java\trustlib\com\ms\xml\om
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following
message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the
following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From k_coffin at conknet.com Sun Feb 22 02:58:22 1998
From: k_coffin at conknet.com (Kerry Coffin)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
Message-ID: <01bd3f3d$977564d0$ed0620ce@lbynum.esri.com>
What is Base64?
Regards,
Kerry Coffin
Environmental Systems Research Institute (ESRI)
-----Original Message-----
From: Rick Jelliffe
To: Michael Emmel ; xml-dev@ic.ac.uk
Date: Friday, February 20, 1998 9:44 PM
Subject: Re: Binary Data
>
>
>From: Michael Emmel
>
>
>
>>Is it possible to include binary data in a XML document and follow the
>>spec.
>
>
>It is possible to have binary data in an XML *document* but it is not
>possible
>to have (unencoded) binary data in an XML text *entity*. A document is
>constructed from entities. An entity is usually a file. An entity is either
>text
>or binary (NDATA) but not both.
>
>You can use Base64 encoding to stick non-text data inside elements:
>
>...
>]>
>
>...
>...
>...
>
>
>
>CDATA marked sections are only a shorthand mechanism for data which has a
>lot of
>"&" or "<" characters which you might find tedious to delimit into entity
>references.
>It is not a mechanism for embedding raw binary, per se.
>
>Rick Jelliffe
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Feb 22 10:57:37 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:11 2004
Subject: LISTRIVIA
In-Reply-To: <01BD3EE2.0FB98770@NEMO>
Message-ID: <3.0.1.16.19980222103123.1c3f3e98@pop3.demon.co.uk>
At 16:02 21/02/98 -0800, [a number of posters in combination] wrote:
[A message]
>
>-----Original Message-----
[which quoted another message in full]
>
> -----Original Message-----
[which itself quoted another message in full]
[and finished with cascading xml-dev backmatter].
and in another message a simple question was asked followed by cascading
quoted messages which added no value.
----------------------------------------------------------------------
Since new members are continually joining the list - and we welcome them
:-) - , I'll reiterate our policy for minimising the amount of material
posted. Remember that:
- many people pay personal money for mail (including me)
- duplicated material is excessively tedious on the hypermail list and
takes up valuable space
- duplication takes up space on reader's local storage.
- automatic quoting is not a good approach towards managing information.
XML encourages people to normalise material as much as possible.
Please therefore excise all material that you don't directly refer to in
your message. Most people prefer to see the quoted material followed by the
annotation rather than the annotation followed by the original message.
Remember that the material is all hypermailed and publicly visible and
(optionally) available as a digest. Both of these should be attractive to
read. :-)
For more details and suggestions of other styles to adopt/avoid, you may
wish to follow the various LISTRIVIA threads. These also comment on
multiple copies of postings :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Feb 22 21:05:58 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
Message-ID: <003101bd3fd4$f5c83fc0$2ee044c6@donpark>
BASE64 is MIME content tranfer encoding algorithm defined in RFC 2045. It
is used to map binary data into a range of characters.
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Feb 23 00:50:34 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
Message-ID: <3.0.32.19980222164918.00b68370@pop.intergate.bc.ca>
At 12:59 PM 2/22/98 -0800, Don Park wrote:
>BASE64 is MIME content tranfer encoding algorithm defined in RFC 2045. It
>is used to map binary data into a range of characters.
What's real important from the XML point of view is that (unless my
memory fails me) base64 has the nice property that it uses a very
restricted range of characters, which happens not to include < or &,
and thus can be tossed into an XML doc just about anywhere without
breaking anything. I think a predefined base64 notation attribute
is a no-brainer good idea, so obvious that it can't be new. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From b.laforge at opengroup.org Mon Feb 23 00:59:49 1998
From: b.laforge at opengroup.org (Bill la Forge)
Date: Mon Jun 7 17:00:11 2004
Subject: xml-based protocol
Message-ID: <3.0.32.19980222200447.00a05330@postman.osf.org>
Finally, AXTP is using xml for the wire protocol.
(I've also created some documentation.)
AXTP: Application eXtensible Transactional Protocol (UDP based)
http://www.camb.opengroup.org/~laforge/axtp/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Feb 23 03:14:26 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:11 2004
Subject: SAX: finalising org.sax.xml.Parser
Message-ID: <199802230313.WAA00386@unready.microstar.com>
It's time to finalise SAX before there is such a big code base that we
can no longer make changes. (Thanks, by the way, to James Clark,
DataChannel, and IBM for including native SAX support in their XML
parsers). During this phase, I'd like to make the _minimum_ changes
necessary SAX to define a consistent and simple common functionality
for XML parsers.
Let's start with the Parser interface. I'll use Java syntax because,
while I can read IDL, I don't trust myself to write it:
[current interface]
------------------------------------------------------------------------
package org.xml.sax;
public interface Parser {
public void setEntityHandler (EntityHandler handler);
public void setDocumentHandler (DocumentHandler handler);
public void setErrorHandler (ErrorHandler handler);
public void parse (String publicID, String systemID)
throws java.lang.Exception;
}
------------------------------------------------------------------------
After considering the various discussions over the past few weeks, I
propose that we make the following changes:
1) Add a parse() method that accepts a stream.
2) Add a parse() method that accepts a character buffer.
3) Remove public ID from the current parse() method (I don't think
public IDs are going anywhere fast in XML).
With these changes, the interface would look like this in Java:
[proposed changes]
------------------------------------------------------------------------
package org.xml.sax;
import java.io.InputStream;
public interface Parser {
public void setEntityHandler (EntityHandler handler);
public void setDocumentHandler (DocumentHandler handler);
public void setErrorHandler (ErrorHandler handler);
public void parse (String uri)
throws java.lang.Exception;
public void parse (InputStream is, String baseURI)
throws java.lang.Exception;
public void parse (char ch[], int start, int length, String baseURI)
throws java.lang.Exception;
}
------------------------------------------------------------------------
NOTES:
a. The baseURI argument is necessary for streams and character buffers
in case either contains a relative URI. You can supply a null
value if the document entity will not contain relative URIs.
b. All programming languages initially targeted by SAX (Java, C++, C,
Perl) have some concept of input streams; if we come up against one
that doesn't, it can simply omit the relevant method.
c. The start and length arguments are necessary with the character
buffer in case the XML document is part of a larger array.
Does this give reasonable functionality without limiting the
architectural approaches of parser writers? Remember that individual
implementations can extend this interface, but the interface
represents the minimum common functionality that every SAX-conformant
parser (eventually) provides.
Thanks, and all the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From zmin at iti.gov.sg Mon Feb 23 05:08:40 1998
From: zmin at iti.gov.sg (Dr. Zheng Min)
Date: Mon Jun 7 17:00:11 2004
Subject: Making COM componts from java MSXML (Was: MS XML Parser on the Server)
Message-ID: <01bd4019$733f24c0$96897ac0@zhengmin.iti.gov.sg>
A few questions about making java MSXML COM aware:
1. Mike suggested using ActiveX Wizard for Java to create .IDL file. Has
anyone done it successfully? I tried it just now but a lot of method were
skipped because of non-translatible type (why is that? Does it mean those
methods can't be used in COM interface?).
2. Even worse, I can't re-compile MSXML in J++. I stuck in the first file --
com.ms.xml.dso.XMLDSO.java. The error messages are all in the same type:
Value for argument 'parent' cannot be converted from 'int' in call
to 'Element ElementFactory.createElement(Element parent, int type, Name tag,
String text)'
The statement in XMLDSO.java is:
e = factory.createElement(Element.ELEMENT,
XMLRowsetProvider.nameROWSET);
It doesn't look right but I don't know how MS can make *.class from it (or I
missed something?).
Has anyone tried to recompile it and succeeded.
Thank,
Min
-----Original Message-----
From: Mike Dierken
To: 'Jim Lears' ; xml-dev@ic.ac.uk
Date: Sunday, February 22, 1998 8:03 AM
Subject: RE: MS XML Parser on the Server
>On the MS platform, you can expose all your Java classes and interfaces as
COM interfaces if you use the ActiveX Wizard for Java (JAVAIDL.EXE). It'll
create an .IDL file (and .C and .H files if you want to call the interfaces
from C/C++).
>All Java classes are exposed asl dual interfaces, derived from IDispatch,
which allows them to be called from all COM aware scripting languages
(JavaScript, VB for Automation, etc).
>
>If the Java classes are registered with Javareg (using the CLSIDs from the
generated .IDL file) on the server, you can use the package name rather than
a CLSID.
>To create a Java object, you might try prepending 'java:' on the package
name.
> Server.CreateObject("java:com.ms.xml.om.Document")
>
>Hope this helps...
>
>Mike D
>DataChannel
>
>-----Original Message-----
>From: Jim Lears [SMTP:JimL@Alphag.net]
>Sent: Saturday, February 21, 1998 3:36 PM
>To: xml-dev@ic.ac.uk
>Subject: RE: MS XML Parser on the Server
>
>Server.CreateObject in VBScript is used for creating instances of COM
>objects. The Java XML Parser doesn't expose any COM interfaces...notably
>IClassFactory which is used to instantiate COM objects. The C++ version
>is what you need...its an ActiveX control. The source code for both
>parsers is available. If you insist on using the Java version, you could
>mod it up to sport a COM interface..
>
>
>Helping To Destroy The English Language
>
> -----Original Message-----
> From: Mike Wagner [SMTP:mwagner@ets.org]
> Sent: Friday, February 20, 1998 3:33 PM
> To: xml-dev@ic.ac.uk
> Subject: MS XML Parser on the Server
>
> Has anybody managed to get the Microsoft Java XML Parser running
>as a
> component accessible by ASP under IIS? I tried what seemed to me
>to be the
> obvious approach and that didn't work. I copied the java classes
>to the
> TrustLib directory, then registered them with javareg. (An
>excerpt of the
> BAT I used file is at the end of this message). However, when I
>try a
> simple Server.CreateObject("com.ms.xml.om.Document") call in an
>ASP page,
> it dies with the following error:
>
> Microsoft JScript runtime error '800a01ad'
>
> Automation server can't create object
>
> /xmltest.asp, line 14
>
> Any insights? Thanks.
>
> Mike Wagner
> Educational Testing Service
> mwagner@ets.org
>
> -----------------Javareg BAT file--------------------
> cd \winnt\java\trustlib\com\ms\xml\dso
> javareg /register /class:SchemaNode
>/progid:com.ms.xml.dso.SchemaNode
> cd \winnt\java\trustlib\com\ms\xml\dso
> javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO
> cd \winnt\java\trustlib\com\ms\xml\dso
> javareg /register /class:XMLParserThread
> /progid:com.ms.xml.dso.XMLParserThread
> cd \winnt\java\trustlib\com\ms\xml\dso
> javareg /register /class:XMLRowsetProvider
> /progid:com.ms.xml.dso.XMLRowsetProvider
> cd \winnt\java\trustlib\com\ms\xml\om
>
>
>
> xml-dev: A list for W3C XML Developers. To post,
>mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following
>message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the
>following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Mon Feb 23 11:25:19 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:11 2004
Subject: MS XML Parser on the Server
Message-ID: <01bd404d$ad63cbe0$1e09e391@mhklaptop.bra01.icl.co.uk>
>Has anybody managed to get the Microsoft Java XML Parser running as a
>component accessible by ASP under IIS?
I tried and failed, probably because I was doing it wrong; then I rewrote
my app using SAX (over AElfred) and have this working under ASP fine.
I tried first using Javasoft's ActiveX Bridge which I couldn't get to work
except for the most trivial single-class javabeans; then I tried using
javareg and got it working - at least once I had worked out how to ensure
that the class path setting for the Microsoft Java VM was right. I found it
useful to
test the thing with a little VB app as the environment is more controllable.
I found it necessary to pay some attention to exception handling: if you
don't catch the things, they have a habit of crashing the ActiveX container,
i.e. the web server.
To keep things simple, I wrote a simple wrapper class for my application
which exposed all the interfaces I needed in the ASP script and nothing
else, and it was this wrapper class that I registered using javareg. The
underlying Java classes, so long as they are on the classpath, do not need
to be registered.
My javareg call was
javareg /register /class:com.icl.saxon.showXML /progid:ShowXML.Java
and the CreateDocument (in VBScript) was:
Set app = CreateObject("ShowXML.Java")
I haven't tried calling back from the Java code to ActiveX objects (e.g.
calling Response.Write) but it should work in theory. Instead I put the
output in String variables which the ASP page retrieves explicitly using
methods on ShowXML. Not elegant, but I was deliberately minimising the
number of things that might go wrong. I also haven't tried anything
complicated with collections or enumerations.
Hope that helps,
Mike Kay, ICL
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Mon Feb 23 11:48:07 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:11 2004
Subject: Automating Search Interfaces
Message-ID: <01bd4050$b9604f60$1e09e391@mhklaptop.bra01.icl.co.uk>
>I would like to talk about the location of the person making the search versus >the location of the product or service provider
Geographic/Spatial queries are a well-researched topic in the database literature. Free text retrieval is definitely a weak approach, though people attempt it by using thesaurus facilities to represent the structure of a gazetteer. In most of the practical systems I have seen, spatial query is done using postal codes: the system needs knowledge of which postal districts are near each other. (We also use such techniques for scheduling the itinerary of service engineers).
>A hotel room is a 'chambre' in french. If I search for a hotel room in Italy, I>don't know the word for room in italian...
Multilingual search is well researched and seems to work reasonably well. The more difficult problem is to distinguish agencies that can book you a hotel room from newsletter articles by people enthusing what a wonderful hotel room they were staying in: I think this is why there will always be added value in manual categorization and indexing services.
Mike Kay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980223/d9793ec6/attachment.htm
From hb at ix.heise.de Mon Feb 23 13:01:46 1998
From: hb at ix.heise.de (Henning Behme)
Date: Mon Jun 7 17:00:11 2004
Subject: Ad: small app + article (XML/DSSSL).
References: <3.0.1.16.19980214135027.63dfb77c@pop3.demon.co.uk>
Message-ID: <34F172D9.15712E4A@ix.heise.de>
Hi,
we (iX Magazine in Germany) have put an article online (in German,
though - I'll try to provide an English version asap) that introduces a
small XML application and shows how its data is being converted into
HTML using James Clark's Jade. The app is a tiny attempt to display
literary history in terms of authors (when born &c.) and explains two
DSSSl style sheets which a) show the toc and b) list details of a
(chosen) author. Those of you who read German may try (if interested :-)
http://www.heise.de/ix/artikel/1998/03/156/
The app itself is online, too (toc and single author by now; I am
working on century-oriented lists and the like)
http://www.heise.de/ix/raven/Web/xml/lit
toc is static, author is done on the fly using Jade. I thought it would
be better this way than to generate all the files for the authors,
although this, of course, means waiting for a short while :-)
Best regards,
hb
--
Henning Behme
iX - Magazin fuer professionelle Informationstechnik
Helstorfer Str. 7 * 30625 Hannover * Germany
http://www.heise.de/ix/ * +49 511 5352-374 * -361 (Fax)
------ White, adj. and n. Black (Ambrose Bierce) ------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gmckenzi at JetForm.com Mon Feb 23 14:48:18 1998
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 17:00:11 2004
Subject: finalising org.sax.xml.Parser
Message-ID:
David,
While PUBLIC may not be going anywhere fast, I'd prefer that the parse()
call-level support for it be left in SAX. I intend to make ad-hoc use
of it internally (rolling my own catalogs and such).
I support your other proposed additions to the interface.
Gavin.
>-----Original Message-----
>From: David Megginson [SMTP:ak117@freenet.carleton.ca]
>Sent: Sunday, February 22, 1998 10:13 PM
>To: xml-dev Mailing List
>Subject: SAX: finalising org.sax.xml.Parser
>
>It's time to finalise SAX before there is such a big code base that we
>can no longer make changes. (Thanks, by the way, to James Clark,
>DataChannel, and IBM for including native SAX support in their XML
>parsers). During this phase, I'd like to make the _minimum_ changes
>necessary SAX to define a consistent and simple common functionality
>for XML parsers.
>
>Let's start with the Parser interface. I'll use Java syntax because,
>while I can read IDL, I don't trust myself to write it:
>
>
>[current interface]
>------------------------------------------------------------------------
> package org.xml.sax;
>
> public interface Parser {
>
> public void setEntityHandler (EntityHandler handler);
> public void setDocumentHandler (DocumentHandler handler);
> public void setErrorHandler (ErrorHandler handler);
>
> public void parse (String publicID, String systemID)
> throws java.lang.Exception;
>
> }
>------------------------------------------------------------------------
>
>
>After considering the various discussions over the past few weeks, I
>propose that we make the following changes:
>
>1) Add a parse() method that accepts a stream.
>
>2) Add a parse() method that accepts a character buffer.
>
>3) Remove public ID from the current parse() method (I don't think
> public IDs are going anywhere fast in XML).
>
>With these changes, the interface would look like this in Java:
>
>
>[proposed changes]
>------------------------------------------------------------------------
> package org.xml.sax;
> import java.io.InputStream;
>
> public interface Parser {
>
> public void setEntityHandler (EntityHandler handler);
> public void setDocumentHandler (DocumentHandler handler);
> public void setErrorHandler (ErrorHandler handler);
>
> public void parse (String uri)
> throws java.lang.Exception;
> public void parse (InputStream is, String baseURI)
> throws java.lang.Exception;
> public void parse (char ch[], int start, int length, String baseURI)
> throws java.lang.Exception;
>
> }
>------------------------------------------------------------------------
>
>
>NOTES:
>
>a. The baseURI argument is necessary for streams and character buffers
> in case either contains a relative URI. You can supply a null
> value if the document entity will not contain relative URIs.
>
>b. All programming languages initially targeted by SAX (Java, C++, C,
> Perl) have some concept of input streams; if we come up against one
> that doesn't, it can simply omit the relevant method.
>
>c. The start and length arguments are necessary with the character
> buffer in case the XML document is part of a larger array.
>
>
>Does this give reasonable functionality without limiting the
>architectural approaches of parser writers? Remember that individual
>implementations can extend this interface, but the interface
>represents the minimum common functionality that every SAX-conformant
>parser (eventually) provides.
>
>
>Thanks, and all the best,
>
>
>David
>
>--
>David Megginson ak117@freenet.carleton.ca
>Microstar Software Ltd. dmeggins@microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Mon Feb 23 14:57:29 1998
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 17:00:11 2004
Subject: xml-based protocol (axtp)
References: <3.0.32.19980222200447.00a05330@postman.osf.org>
Message-ID: <34F18E5E.AB73276E@mixx.de>
this (and the object stream <-> xml conversion) looks interesting. is there a
tar/zipped/...'d version anywhere.
Bill la Forge wrote:
> AXTP: Application eXtensible Transactional Protocol (UDP based)
> http://www.camb.opengroup.org/~laforge/axtp/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Mon Feb 23 15:28:08 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:11 2004
Subject: xml-based protocol
References: <3.0.32.19980222200447.00a05330@postman.osf.org>
Message-ID: <34F19695.17905F99@infinet.com>
Bill la Forge wrote:
> Finally, AXTP is using xml for the wire protocol.
> (I've also created some documentation.)
>
> AXTP: Application eXtensible Transactional Protocol (UDP based)
> http://www.camb.opengroup.org/~laforge/axtp/
This looks interesting except that the TransactionFactory interface has some
ridiculous names for the methods like createA(), createN(), etc. etc. For one
simple interface, I think that worrying about class file size is a waste of time
when compared to having methods and constants which are readable and
understandable.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Mon Feb 23 15:29:47 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:11 2004
Subject: finalising org.sax.xml.Parser
Message-ID: <001801bd406f$18cbd140$2ee044c6@donpark>
David,
I agree with most of the changes especially the KISS solution to multiple
input type problem.
I have just two recommendations:
1. Keep Public ID.
2. Use System ID instead of Public ID.
End result is that we just have two new methods in Parser and no change to
existing methods.
My reasons are:
1. Who knows where that rubber chicken will come in handy?
2. It is trivial for a SAX parser implementor to extract baseURI from URI.
3. It is not trivial and rather confusing for a SAX user to figure out what
the base URI is.
So the method signatures would be:
public void
parse (String pubID, String sysID)
throws java.lang.Exception;
public void
parse (String pubID, String sysID, InputStream is)
throws java.lang.Exception;
public void
parse (String pubID, String sysID, char ch[], int offset, int length)
throws java.lang.Exception;
PS: Parameter orders were changed because I prefer to append new arguments
rather prepending.
For the new methods, pubID and sysID are used to tell the parser that "data
from the given stream or character array should be treated as if it came
from given pubID and sysID".
Regards,
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jmj at thomtech.com Mon Feb 23 16:02:42 1998
From: jmj at thomtech.com (jmj@thomtech.com)
Date: Mon Jun 7 17:00:11 2004
Subject: MS XML Parser on the Server
Message-ID: <9802238882.AA888249744@ccgate.thomtech.com>
Greetings!
So where would I find the source code for the C++ version? I haven't
been able to find it at the microsoft site.
Thanks!
--Jim Jordan
jmj@thomtech.com -- Thomson Technologies Lab Group
From the sublime to the ridiculous is but a step.
Napoleon Bonaparte - on the retreat from Moscow
______________________________ Reply Separator _________________________________
Subject: RE: MS XML Parser on the Server
Author: Jim Lears at internet
Date: 2/21/98 6:35 PM
Server.CreateObject in VBScript is used for creating instances of COM
objects. The Java XML Parser doesn't expose any COM interfaces...notably
IClassFactory which is used to instantiate COM objects. The C++ version
is what you need...its an ActiveX control. The source code for both
parsers is available. If you insist on using the Java version, you could
mod it up to sport a COM interface..
Helping To Destroy The English Language
-----Original Message-----
From: Mike Wagner [SMTP:mwagner@ets.org]
Sent: Friday, February 20, 1998 3:33 PM
To: xml-dev@ic.ac.uk
Subject: MS XML Parser on the Server
Has anybody managed to get the Microsoft Java XML Parser running
as a
component accessible by ASP under IIS? I tried what seemed to me
to be the
obvious approach and that didn't work. I copied the java classes
to the
TrustLib directory, then registered them with javareg. (An
excerpt of the
BAT I used file is at the end of this message). However, when I
try a
simple Server.CreateObject("com.ms.xml.om.Document") call in an
ASP page,
it dies with the following error:
Microsoft JScript runtime error '800a01ad'
Automation server can't create object
/xmltest.asp, line 14
Any insights? Thanks.
Mike Wagner
Educational Testing Service
mwagner@ets.org
-----------------Javareg BAT file--------------------
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:SchemaNode
/progid:com.ms.xml.dso.SchemaNode
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLDSO /progid:com.ms.xml.dso.XMLDSO
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLParserThread
/progid:com.ms.xml.dso.XMLParserThread
cd \winnt\java\trustlib\com\ms\xml\dso
javareg /register /class:XMLRowsetProvider
/progid:com.ms.xml.dso.XMLRowsetProvider
cd \winnt\java\trustlib\com\ms\xml\om
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following
message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the
following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Feb 23 16:10:40 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:11 2004
Subject: finalising org.sax.xml.Parser
In-Reply-To: <001801bd406f$18cbd140$2ee044c6@donpark>
References: <001801bd406f$18cbd140$2ee044c6@donpark>
Message-ID: <199802231609.LAA01939@unready.microstar.com>
Don Park writes:
> I agree with most of the changes especially the KISS solution to multiple
> input type problem.
>
> I have just two recommendations:
>
> 1. Keep Public ID.
> 2. Use System ID instead of Public ID.
That's two votes for keeping Public ID (and one for sticking with the
standard terminology for system IDs, instead of using the
Web-hacker-friendly "URI"). I would have no problem going with Don's
proposal, especially since it is identical to my discarded first
draft -- would anyone prefer _not_ to see public IDs, then?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mike at jmaca.com Mon Feb 23 16:16:21 1998
From: mike at jmaca.com (Michael Emmel)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
References: <01bd3f3d$977564d0$ed0620ce@lbynum.esri.com>
Message-ID: <34F1A319.49A499DE@jmaca.com>
Okay I read the spec better now that someone methiond NDATA and I undertstand
how
the unparsed entity works.
What I still do not understand and it seems to be
undefinded is how the parser is restarted once and application consumes
a unparsed entity. At least for me.
ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S
PubidLiteral S SystemLiteral
NDataDecl::= S 'NDATA' S Name [ VC: Notation Declared ]
Hers the description of a VC
Validity Constraint: Notation Declared
The Name must match the declared name of a notation.
The SystemLiteral is called the entity's system identifier. It is a
URI, which may be used to retrieve the entity.
Note that the hash mark (#) and fragment identifier frequently used
with URIs are not, formally, part of the URI
itself; an XML processor may signal an error if a fragment identifier
is given as part of a system identifier. Unless
otherwise provided by information outside the scope of this
specification (e.g. a special XML element type defined
by a particular DTD, or a processing instruction defined by a
particular application specification), relative URIs
are relative to the location of the resource within which the entity
declaration occurs. A URI might thus be relative
to the document entity, to the entity containing the external DTD
subset, or to some other external parameter
entity.
An XML processor should handle a non-ASCII character in a URI by
representing the character in UTF-8 as
one or more bytes, and then escaping these bytes with the URI
escaping mechanism (i.e., by converting each byte
to %HH, where HH is the hexadecimal notation of the byte value).
In addition to a system identifier, an external identifier may
include a public identifier. An XML processor
attempting to retrieve the entity's content may use the public
identifier to try to generate an alternative URI. If the
processor is unable to do so, it must use the URI specified in the
system literal. Before a match is attempted, all
strings of white space in the public identifier must be normalized to
single space characters (#x20), and leading
and trailing white space must be removed.
Examples of external entity declarations:
and here are some examples
This says to me that binary data is required to either be encoded to ascii to
be included,
or have Mime type boundries for XML tags with binary data not containing the
mime boundries included.
In the document or be obtained from a ascii normalized external URI link.
There is no way to tell a XML arser to skip x number of arbitrary bytes of
embedded unparsed entity data which is consumed by the "application" and then
restart the parser
at the next valid section.
Am I wrong ???
Mike
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Mon Feb 23 16:48:47 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:11 2004
Subject: The XML spec in XML: missing tags
Message-ID: <01bd407b$115e7500$1e09e391@mhklaptop.bra01.icl.co.uk>
I have been playing with the BNF rules in the XML spec as an exercise in XML
tagging.
I noticed that in the XML version of the XML spec, the non-terminal symbol
"S" is incorrectly tagged in rules 60, 62, and 63, and in consequence it is
not
hyperlinked in the HTML version.
Some comments on the XML tagging in the BNF rules:
- it is useful to have the non-terminals tagged, though the way in which it
done is a little clumsy, since the internal identifier and the visible name
of the non-terminal are necessarily in a one-to-one correspondence. The way
it is done seems designed primarily to enable a particular translation to
HTML.
- it is a shame that there is no tagging to distinguish terminal symbols
from metasymbols, since this would enable nicer renditions of the rules,
e.g. exploiting colour, without having to parse the BNF
- it would seem more logical for each rule to have a single , with any
and constraints being embedded within the , rather than
these being separate elements interspersed among multiple elements.
Two comments on the definition of notation in section 6:
- the distinction between non-terminals with an initial upper case and those
with an initial lower case is not at all clear (to me).
- the precedence of the metalanguage operators (e.g. that "A B | C" means
"(A B) | C" is not stated.
Thanks to Peter M-R for prompting me to look at this XML exemplar, it has
been very stimulating!
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From msuzio at ford.com Mon Feb 23 16:50:27 1998
From: msuzio at ford.com (Michael J. Suzio)
Date: Mon Jun 7 17:00:11 2004
Subject: xml:space
References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk>
Message-ID: <199802231650.AA06071@mailfw1.ford.com>
What I wonder is, how does SAX decide what is ignorable
whitespace and what is significant? I'm not clear on how that
works, and the role xml:space plays in defining that.
Ignoring whitespace is one of the most tedious things I keep doing
in my XML parsing apps, I'd prefer to have to explicitly *work* to
keep whitespace.
What I don't understand is, given something like this in a DTD:
Why wouldn't *any* character data located within
(and not inside one of it's child
elements) be ignorable? I'd expect a parser seeing this:
To ignore those carriage returns and extraneous spaces within the
QUOTE element, and just give me the SOURCE and LINE elements and
their content.
Sorry if this is a stupid question, but it has been bugging me the
last couple weeks.
--
Michael J. Suzio
Web Technical Standards, WWW & Internet Applications
(313) 24-88120
msuzio@eccms1.dearborn.ford.com / msuzio@ford.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From msuzio at ford.com Mon Feb 23 16:59:20 1998
From: msuzio at ford.com (Michael J. Suzio)
Date: Mon Jun 7 17:00:11 2004
Subject: finalising org.sax.xml.Parser
References: <001801bd406f$18cbd140$2ee044c6@donpark> <199802231609.LAA01939@unready.microstar.com>
Message-ID: <199802231658.AA08077@mailfw1.ford.com>
I think keeping the method with Public ID is fine, but if in
many cases we're just passing NULL as the first arg, why don't
we have a method which just accepts the system ID/URI? I
myself have no use for Public ID, so I essentially always
just pass in NULL, which to me makes the code look confusing...
(I hate NULL/ignored parameters, especially as the first arg, I
usually rank args in order of "importance" to the method/procedure).
--
Michael J. Suzio
Web Technical Standards, WWW & Internet Applications
(313) 24-88120
msuzio@eccms1.dearborn.ford.com / msuzio@ford.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Mon Feb 23 17:05:27 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data
Message-ID: <000f01bd407c$3d9e9f90$2ee044c6@donpark>
Michael,
Check out the XML-Binary demo at
http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html
Binary.xml file contains an element with embedded binary data.
I do not like notation based solution to binary data because it requires DTD
processing. IMHO, High performance XML applications will opt to ignore DTD
because it requires additional resources as well as causing processing
hiccups. XML-Binary is being designed around a set of reserved attributes
which tells you how the data was encoded (base64) and what the data is
(image/gif). All this can be done easily by checking for the attributes in
a single-pass processing systems. It also allows specification of
multi-layer encoding of binary data so that your application can easily tell
that an XML element contains postscript image which as compressed using ZIP
and then encoded using BASE64.
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Mon Feb 23 17:15:43 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:11 2004
Subject: xml:space
References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk> <199802231650.AA06071@mailfw1.ford.com>
Message-ID: <34F1AF9F.77CD5499@infinet.com>
Michael J. Suzio wrote:
> What I wonder is, how does SAX decide what is ignorable
> whitespace and what is significant? I'm not clear on how that
> works, and the role xml:space plays in defining that.
> Ignoring whitespace is one of the most tedious things I keep doing
> in my XML parsing apps, I'd prefer to have to explicitly *work* to
> keep whitespace.
> What I don't understand is, given something like this in a DTD:
I think for problems like this, the application should just filter it all out
itself which is very simple.
Here is an inefficient implementation that will do just that for you in Java for
instance:
String data = "Fee Fi Fo\n\n\n Fum\t\t\t ";
java.util.StringTokenizer st = new StringTokenizer(data);
StringBuffer buffer = new StringBuffer();
while (st.hasMoreTokens()) {
buffer.append(st.nextToken());
buffer.append(' ');
}
buffer.setLength(buffer.length()-1);
String result = buffer.toString();
Result should be "Fee Fi Fo Fum"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Feb 23 17:15:51 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:11 2004
Subject: xml:space
In-Reply-To: <199802231650.AA06071@mailfw1.ford.com>
References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk>
<199802231650.AA06071@mailfw1.ford.com>
Message-ID: <199802231713.MAA02467@unready.microstar.com>
Michael J. Suzio writes:
> What I wonder is, how does SAX decide what is ignorable
> whitespace and what is significant? I'm not clear on how that
> works, and the role xml:space plays in defining that.
> Ignoring whitespace is one of the most tedious things I keep doing
> in my XML parsing apps, I'd prefer to have to explicitly *work* to
> keep whitespace.
SAX itself is not a program, but its interface allows DTD-driven
parsers to make the distinction described in clause 2.10 (AElfred
takes advantage of the distinction):
2.10 White Space Handling
In editing XML documents, it is often convenient to use "white space"
(spaces, tabs, and blank lines, denoted by the nonterminal S in this
specification) to set apart the markup for greater readability. Such
white space is typically not intended for inclusion in the delivered
version of the document. On the other hand, "significant" white space
that should be preserved in the delivered version is common, for
example in poetry and source code.
An XML processor must always pass all characters in a document that
are not markup through to the application. A validating XML processor
must also inform the application which of these characters constitute
white space appearing in element content.
Note that this has nothing to do with the `xml:space' attribute -- it
is your application, rather than the XML parser, that is allowed to
act on that one.
> What I don't understand is, given something like this in a DTD:
>
>
>
> Why wouldn't *any* character data located within
>
(and not inside one of it's child
> elements) be ignorable? I'd expect a parser seeing this:
>
>
>
> To ignore those carriage returns and extraneous spaces within the
> QUOTE element, and just give me the SOURCE and LINE elements and
> their content.
Absolutely correct. If your XML parser is DTD-driven (as AElfred is),
it should somehow flag the carriage returns and leading spaces in your
example as ignorable. It is a major pain having to deal with this
kind of thing yourself, if your parser is not DTD-aware.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mike at jmaca.com Mon Feb 23 17:22:33 1998
From: mike at jmaca.com (Michael Emmel)
Date: Mon Jun 7 17:00:11 2004
Subject: Binary Data Resolved
References: <000f01bd407c$3d9e9f90$2ee044c6@donpark>
Message-ID: <34F1B1ED.FA5F78B4@jmaca.com>
Don Park wrote:
> Michael,
>
> Check out the XML-Binary demo at
> http://www.quake.net/~donpark/SaxDomDemo/SaxDomDemo.html
>
> Binary.xml file contains an element with embedded binary data.
Thanks!!
Another poster also suggestion that the packaging of various entities that
make up and XML documnet is outside of the XML spec.
I agree so I think I'll work on my idea of a jar like file with a XML header.
Very cool IMHO.
and save the Base64 encoding for special circumstances.
There does need to be a standard way to transmit all the "static"
data that makes up a complete xml document and other complex data soruces.
And thanks to all who helped me resolve this it was very important to me.
Mike
mike@jmaca.com
Private post:
Subject:
Re: Binary Data
Date:
Mon, 23 Feb 1998 12:04:04 -0500
From:
David Megginson
To:
mike@jmaca.com
References:
1 , 2 , 3 , 4
Michael Emmel writes:
> Failing that your left with coming up with a standard way to
> "package" all internal links.
I think that that is by far a better solution -- kludges (like
embedding all objects in a single XML file) are sometimes necessary to
get something working, but we don't want to codify them in a spec if
we can avoid doing so. A good, general Internet packaging protocol
would solve many problems both inside and outside XML.
In the mean time, you can use base64 if you really need to.
All the best,
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Feb 23 17:22:52 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:00:12 2004
Subject: Binary Data
Message-ID: <003701bd407f$afc15830$7b0b4ccb@NT.JELLIFFE.COM.AU>
From: Michael Emmel
>This says to me that binary data is required to either be encoded to ascii
to
>be included, or have Mime type boundries for XML tags with binary data
> not containing the mime boundries included.
>In the document or be obtained from a ascii normalized external URI link.
Binary data can only be included in a parseable entity if it is first
encoded
in some way which
1) does not contain delimiters which may cause false triggering
2) does not contain any characters which the XML "SGML declaration"
says are unused (or shunned).
Base64 is one such encoding. Other encodings may be more efficient
if you have a 16-bit data stream.
The way to signal you are using an encoding is to use an element
with a notation attribute.
If you embed binary data with MIME type boundaries, you no longer
have a parseable XML entity, you have a MIME multipart file which
can be processed to generate an XML entity.
>There is no way to tell a XML arser to skip x number of arbitrary bytes of
>embedded unparsed entity data which is consumed by the "application" and
then
>restart the parser
>at the next valid section.
An XML parser is not interested in the contents of a non-XML-parseable
entity. Indexing into binary data is either done before the parser (i.e. by
embedding the appropriate instructions in the system identifier of the
entity) or by the application after the parser.
>Am I wrong ???
What do you mean "restart the parser"? Parsing continues after an entity
reference.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From msuzio at ford.com Mon Feb 23 17:30:52 1998
From: msuzio at ford.com (Michael J. Suzio)
Date: Mon Jun 7 17:00:12 2004
Subject: xml:space
References: <3.0.1.16.19980220223432.3ea70dee@pop3.demon.co.uk>
<199802231650.AA06071@mailfw1.ford.com> <199802231713.MAA02467@unready.microstar.com>
Message-ID: <199802231730.AA15010@mailfw1.ford.com>
OK, to be more precise, the problem I think I'm seeing is that,
using an XML example, like this:
I would expect (using SAX) to receive an ignorable() event when
the end of the opening QUOTE tag is reached, and the "\n " string
found. I'm not seeing that, using the DXP implementation. Should
I? I'm not sure if I see what circumstances actually alert
a parser that, yes, this whitespace is *not* significant. I
know it is supposed to pass the data to the application, but the
data is also supposed to be flagged, correct?
--
Michael J. Suzio
Web Technical Standards, WWW & Internet Applications
(313) 24-88120
msuzio@eccms1.dearborn.ford.com / msuzio@ford.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Tue Feb 24 04:02:06 1998
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <199802230313.WAA00386@unready.microstar.com>
Message-ID: <34F23D1B.E6172400@jclark.com>
> public void parse (InputStream is, String baseURI)
> throws java.lang.Exception;
> public void parse (char ch[], int start, int length, String baseURI)
> throws java.lang.Exception;
I don't think this last one is a good idea. If you want something that
operates on a stream of characters as opposed to bytes, it should be
void parse(Reader r, String baseURI)
Using an array of chars is as bad an idea as it would be to replace the
InputStream method with a method that operates on an array of bytes.
I am not convinced this really buys you anything. It's easy enough to
write an InputStream that takes an array of chars and presents then as a
sequence of UTF-16 encoded bytes. It also raise some problems since the
XML spec doesn't define the operation of a processor on an sequence of
chars. For example, what if anything should the processor do with an
encoding declaration in this case?
If you don't want to put Readerin to avoid dependency on JDK 1.1, I
would suggest simply leaving this out for now.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at Eng.Sun.COM Tue Feb 24 05:13:13 1998
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 17:00:12 2004
Subject: Last call for submissions: XML Developers' Day
Message-ID: <199802240511.VAA29721@boethius.eng.sun.com>
Reminder: the deadline for submissions is this Friday, February 27.
See the original notice below for details.
Jon
========================================================================
CALL FOR PRESENTATIONS: XML DEVELOPERS' DAY 1998.03.27
A one-day technical conference for XML developers will be held Friday,
March 27, in Seattle, Washington. The event constitutes the last day
of the GCA XML Conference (http://www.gca.org/conf/xmlcon98/).
XML Developers' Day is a single-track event devoted entirely to
technical reports on the latest developments in XML implementation.
If you are engaged in the construction of any software that works with
XML -- converters, parsers, servers, browsers, editors, or XML-based
vertical applications -- here is your chance to share your work with
an audience that can understand and appreciate it.
Since stylesheet-based rendering is part of XML publishing, developers
of tools that support XSL or DSSSL are invited to show their latest
offerings as well. We're also open to presentations on XML-based
languages (CML, OFX, etc.) and related efforts that might have a
significant impact on the future of XML (RDF, XML-Data, etc.) if they
are of particular interest to XML developers.
Vendors of commercial tools can participate, but they must confine
their presentations to the technical aspects of current XML products
in development. Table space will be made available for the
distribution of product announcements and commercial literature.
REGISTRATION
The registration fee for XML Developers' Day is $275 for GCA members
and $390 for non-GCA members (see the registration page below for
conference and tutorial rates). This is mighty inexpensive for an
inside update on the very latest activity in this field. You can
register at
http://www.gca.org/conf/xmlcon98/registra.htm
N.B.: Presenters get in free.
CALL FOR PRESENTATIONS
If you would like to give a report at this event, send a paragraph or
two describing your presentation, based on a conservative estimate of
the status of your project as it will stand on March 27, to Jon Bosak
(bosak@eng.sun.com). Also include a description of the audio-visual
equipment you will need for your presentation and an estimate of its
duration. Please include the phrase "XML Dev Day" somewhere in the
subject line of your message.
Since we want up-to-the-minute reports on activities in progress,
there will be no published proceedings, and therefore you need not
submit your entire presentation in advance. But please try to make
your forecasted description as accurate as possible so that we can
choose the most interesting and relevant submissions.
The deadline for submissions is Friday, February 27.
Jon
----------------------------------------------------------------------
Jon Bosak, Online Information Technology Architect, Sun Microsystems
901 San Antonio Road, MPK17-101, Palo Alto, California 94043
----------------------------------------------------------------------
If a man look sharply and attentively, he shall see Fortune; for
though she be blind, yet she is not invisible. -- Francis Bacon
----------------------------------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Feb 24 06:09:02 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
Message-ID: <002401bd40e9$fde8c510$2ee044c6@donpark>
>I don't think this last one is a good idea. If you want something that
>operates on a stream of characters as opposed to bytes, it should be
>
> void parse(Reader r, String baseURI)
>
>Using an array of chars is as bad an idea as it would be to replace the
>InputStream method with a method that operates on an array of bytes.
>
>I am not convinced this really buys you anything. It's easy enough to
>write an InputStream that takes an array of chars and presents then as a
>sequence of UTF-16 encoded bytes. It also raise some problems since the
>XML spec doesn't define the operation of a processor on an sequence of
>chars. For example, what if anything should the processor do with an
>encoding declaration in this case?
If I remember correctly, what David is trying to do is provide us with means
to parse XML data from a byte stream as well as character stream. Since
Reader will actually hide the byte-based aspect of the data stream, it in
inappropriate for our purpose.
XML character stream is also very useful when XML data is generated and
processed within a framework. In such a system, converting character
streams to byte stream and then converting it back to character stream is
unnecessary.
As far as what to do with encoding information when dealing with character
streams, will there be any problem if SAX just ignored it?
Regards,
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Feb 24 10:59:34 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:12 2004
Subject: finalising org.sax.xml.Parser
Message-ID: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk>
>>From: David Megginson [SMTP:ak117@freenet.carleton.ca]
[heavily cut]
>>After considering the various discussions over the past few weeks, I
>>propose that we make the following changes:
>>
>>1) Add a parse() method that accepts a stream.
>>2) Add a parse() method that accepts a character buffer.
>>With these changes, the interface would look like this in Java:
>>
>> public void parse (InputStream is, String baseURI)
>> throws java.lang.Exception;
>> public void parse (char ch[], int start, int length, String baseURI)
>> throws java.lang.Exception;
>>NOTES:
>>
>>a. The baseURI argument is necessary for streams and character buffers
>> in case either contains a relative URI. You can supply a null
>> value if the document entity will not contain relative URIs.
>>
Comments:
1. Is the (ch, start, length) method really necessary, given that one can
supply a StringReader or whatever to the parse(InputStream) method?
2. If my "main" XML document is in a record in a database, then it is very
likely that any other entities referred to will be in the database as well.
Therefore, I think the logical approach in this situation is for the
application to resolve all URIs encountered: the parser should call the
application supplying a URI and the application should return an InputStream
to allow the parser to read it. This should presumably be done via the
EntityHandler interface.
And a question: is there a recommended way to abort a parse once the
application has got the information it needs (e.g extracting the contents of
the TITLE element)? Would an interface like parser.abort() be cleaner than
playing around with exceptions? I ask because in handling the results of a
free text search, I am parsing all the retrieved documents when I only need
a bit of text from the beginning of each, and this is obviously wasteful. I
thought perhaps of supplying a stream and generating a premature
end-of-file, and then trapping the exception that comes back.
Regards, Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Feb 24 11:55:28 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:12 2004
Subject: finalising org.sax.xml.Parser
Message-ID: <01bd411b$2f327400$1e09e391@mhklaptop.bra01.icl.co.uk>
>Would anyone prefer _not_ to see public IDs, then?
I'm not fundamentally opposed to them, but I can't see much point in them
either. The XML spec defines no semantics for a public identifier and we are
left to guess that it might have a similar meaning to a similar construct in
SGML. They are one of the bits of SGML legacy which should have been taken
out. As they're in XML it might make sense to support them in SAX: the
problem is that if you do so, you have to say what they mean.
(Actually system identifiers aren't very well explained either: we are told
they are URI's and there's no definitive statement of what a URI is. The
difference is that most readers can guess).
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 24 13:48:16 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
In-Reply-To: <002401bd40e9$fde8c510$2ee044c6@donpark>
References: <002401bd40e9$fde8c510$2ee044c6@donpark>
Message-ID: <199802241346.IAA00395@unready.microstar.com>
Don Park writes:
> If I remember correctly, what David is trying to do is provide us with means
> to parse XML data from a byte stream as well as character stream. Since
> Reader will actually hide the byte-based aspect of the data stream, it in
> inappropriate for our purpose.
>
> XML character stream is also very useful when XML data is generated and
> processed within a framework. In such a system, converting character
> streams to byte stream and then converting it back to character stream is
> unnecessary.
This is true, but I think that James's point is well taken. The
character _buffer_ doesn't really buy us anything. I am reluctant to
use a character reader for two reasons:
1) It is a concept that doesn't translate well to languages other than
Java (or even to Java 1.0.2 for that matter).
2) It imposes another architectural requirement on SAX-conformant
parsers (the ability to receive characters directly, bypassing the
normal input mechanisms), and I'm trying to keep interference to a
minimum.
It is slightly inefficient to go from characters to a byte stream to
characters, but it's not that bad (especially if we use ISO-8859-1 or
UCS-2 for the encoding), and it keeps SAX simple and general. Given
the discussion so far, then, we are ending up with something like
this:
package org.xml.sax;
import java.io.InputStream;
public interface Parser {
public abstract void setEntityHandler (EntityHandler handler);
public abstract void setDocumentHandler (DocumentHandler handler);
public abstract void setErrorHandler (ErrorHandler handler);
public abstract void parse (String publicId, String systemId)
throws java.lang.Exception;
public abstract void parse (String publicId, String systemId,
InputStream inputStream)
throws java.lang.Exception;
}
If you need more, you can always extend the interface:
package com.acme.xml;
import java.io.Reader;
public interface SuperParser extends org.xml.sax.Parser {
public abstract void parse (String publicId, String systemId,
Reader reader)
throws java.lang.Exception;
}
In an ideal world, we'd also have some kind of ability to ask to
parser to turn validation on or off, but I'm not certain that that's
practical: any thoughts?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 24 13:59:52 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: finalising org.sax.xml.Parser
In-Reply-To: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk>
References: <01bd4113$77fa0520$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <199802241358.IAA00435@unready.microstar.com>
Michael Kay writes:
> Comments:
> 1. Is the (ch, start, length) method really necessary, given that one can
> supply a StringReader or whatever to the parse(InputStream) method?
James has convinced me that it's not -- I'm actually happy to drop it,
since I want to keep the interfaces as simple as possible both to
learn and to implement.
> 2. If my "main" XML document is in a record in a database, then it is very
> likely that any other entities referred to will be in the database as well.
> Therefore, I think the logical approach in this situation is for the
> application to resolve all URIs encountered: the parser should call the
> application supplying a URI and the application should return an InputStream
> to allow the parser to read it. This should presumably be done via the
> EntityHandler interface.
I have considered this approach, but I can anticipate two problems:
1) It puts the burdon of resolving URIs on the application rather than
the parser.
2) It is possible that some programming languages or libraries do not
represent network connections as input streams.
If (2) isn't a problem, we might find a way to work around (1). I'll
be coming back to the EntityHandler interface in a future posting, and
we can take up the issue again then.
> And a question: is there a recommended way to abort a parse once the
> application has got the information it needs (e.g extracting the contents of
> the TITLE element)? Would an interface like parser.abort() be cleaner than
> playing around with exceptions? I ask because in handling the results of a
> free text search, I am parsing all the retrieved documents when I only need
> a bit of text from the beginning of each, and this is obviously wasteful. I
> thought perhaps of supplying a stream and generating a premature
> end-of-file, and then trapping the exception that comes back.
In languages that support exceptions (Java, C++, Perl, and sort-of C),
an exception is probably the cleanest way to handle this. It also
lets you pass application-specific information back to the top level
within your exception.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 24 14:24:39 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: multiple handlers
Message-ID: <199802241423.JAA00516@unready.microstar.com>
In a private message, one SAX user raised the issue again of multiple
handlers. The user suggested the situation where someone wants to
extract information from a document _and_ copy the document to an
OutputStream at the same time: for a clean implementation, each of
these should be in a different handler.
During the last round, most people vetoed this idea. Here it is
again, though, for your consideration:
package org.xml.sax;
import java.io.InputStream;
public interface Parser {
public void addEntityHandler (EntityHandler handler);
public void removeEntityHandler (EntityHandler handler);
public void addDocumentHandler (DocumentHandler handler);
public void removeDocumentHandler (DocumentHandler handler);
public void addErrorHandler (ErrorHandler handler);
public void removeErrorHandler (ErrorHandler handler);
public void parse (String publicId, String systemId)
throws java.lang.Exception;
public void parse (String publicId, String systemId,
InputStream inputStream)
throws java.lang.Exception;
}
Any further thoughts on this issue?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jmodre at edu.uni-klu.ac.at Tue Feb 24 14:31:59 1998
From: jmodre at edu.uni-klu.ac.at (Juergen Modre)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <199802230313.WAA00386@unready.microstar.com>
Message-ID: <34F2E818.1FC4A30B@edu.uni-klu.ac.at>
David Megginson wrote:
> After considering the various discussions over the past few weeks, I
> propose that we make the following changes:
>
> 1) Add a parse() method that accepts a stream.
Fully agree.
> 2) Add a parse() method that accepts a character buffer.
I have similar thoughts like James and therefore don't really see the need for it.
For the case to parse parts from an larger document the char[] can always be
converted to an InputStream to be used with 1).
But maybe your intention goes into another direction.
> 3) Remove public ID from the current parse() method (I don't think
> public IDs are going anywhere fast in XML).
I propose to have a publicID.
E.g. the XML parser DXP supports public identifiers.
> With these changes, the interface would look like this in Java:
> public void parse (String uri)
> throws java.lang.Exception;
SGML/XML friendly "systemId" vs. Web-hacker-friendly "URI" as parameter name:
I personally don't care to much about the name, both are appropiate.
Maybe in a method with publicId the name "systemId" is better readable.
Both names are fine as long as the are good described/documented
(e.g. in the javadoc header in Java) to explain everybody the meaning.
> NOTES:
>
> a. The baseURI argument is necessary for streams and character buffers
> in case either contains a relative URI. You can supply a null
> value if the document entity will not contain relative URIs.
The baseURI gives you all information to parse every relative
EntityReference correctly. What's still missing is the name of the
document where the parsing started. So this name will miss in
an error-message in the starting entity.
So I propose to have:
public abstract void parse (String publicId, String systemId, InputStream inputStream)
instead of
public void parse (InputStream is, String baseURI)
-----------------------------------------------
JUERGEN MODRE
Reisdorf 6
A-9371 Brueckl
Austria (Europe)
Phone: ++43 4214 2320
Mobile: ++43 664 233 22 22
E-mail: jmodre@edu.uni-klu.ac.at
WWW: http://www.edu.uni-klu.ac.at/~jmodre
-----------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gmckenzi at JetForm.com Tue Feb 24 14:57:01 1998
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
Message-ID:
I like the idea of add/remove versus set. In the Java case it meshes
nicely with other Java event mechanisms. From a non-Java biased
perspective it does offer considerable extra flexibility in a simple
manner.
Though I don't have a strict requirement for it today, I'd vote for it.
Gavin.
>-----Original Message-----
>From: David Megginson [SMTP:ak117@freenet.carleton.ca]
>Sent: Tuesday, February 24, 1998 9:24 AM
>To: xml-dev Mailing List
>Subject: SAX: multiple handlers
>
>In a private message, one SAX user raised the issue again of multiple
>handlers. The user suggested the situation where someone wants to
>extract information from a document _and_ copy the document to an
>OutputStream at the same time: for a clean implementation, each of
>these should be in a different handler.
>
>During the last round, most people vetoed this idea. Here it is
>again, though, for your consideration:
>
> package org.xml.sax;
> import java.io.InputStream;
>
> public interface Parser {
>
> public void addEntityHandler (EntityHandler handler);
> public void removeEntityHandler (EntityHandler handler);
>
> public void addDocumentHandler (DocumentHandler handler);
> public void removeDocumentHandler (DocumentHandler handler);
>
> public void addErrorHandler (ErrorHandler handler);
> public void removeErrorHandler (ErrorHandler handler);
>
> public void parse (String publicId, String systemId)
> throws java.lang.Exception;
>
> public void parse (String publicId, String systemId,
> InputStream inputStream)
> throws java.lang.Exception;
>
> }
>
>Any further thoughts on this issue?
>
>
>All the best,
>
>
>David
>
>--
>David Megginson ak117@freenet.carleton.ca
>Microstar Software Ltd. dmeggins@microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jmodre at edu.uni-klu.ac.at Tue Feb 24 15:01:50 1998
From: jmodre at edu.uni-klu.ac.at (Juergen Modre)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <002401bd40e9$fde8c510$2ee044c6@donpark> <199802241346.IAA00395@unready.microstar.com>
Message-ID: <34F2EF37.8979C8DE@edu.uni-klu.ac.at>
> In an ideal world, we'd also have some kind of ability to ask to
> parser to turn validation on or off, but I'm not certain that that's
> practical: any thoughts?
I thinks that is practical and necessary.
One solution would be to have methods like:
void setValidation(boolean validation)
boolean getValidation()
These methods can be called before starting to parse with
the parse() method.
I also think a parse method with an systemId only as parameter would be
convenient. (With targeting to users rather new to XML
and not very used to the publicId's).
public abstract void parse (String systemId)
This would also avoid the need to call every time
entityHandler.resolveEntity() to resolve the Entity.
-----------------------------------------------
JUERGEN MODRE
Reisdorf 6
A-9371 Brueckl
Austria (Europe)
Phone: ++43 4214 2320
Mobile: ++43 664 233 22 22
E-mail: jmodre@edu.uni-klu.ac.at
WWW: http://www.edu.uni-klu.ac.at/~jmodre
-----------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Feb 24 15:03:06 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
Message-ID: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk>
>In a private message, one SAX user raised the issue again of multiple
>handlers
>Any further thoughts on this issue?
>
I've implemented a layer on top of SAX that provides not only multiple
handlers, but also per-element-type handlers. Since it is trivial to
implement this on top of SAX, I suggest it shouldn't go into SAX itself.
(The way you do multiple handler is to write a class MultiHandler that
implements the DocumentHandler interface and accepts in its constructor two
DocumentHandlers; the methods then call these two in turn. Of course either
of them can itself be a MultiHandler).
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Tue Feb 24 15:05:49 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <199802230313.WAA00386@unready.microstar.com> <34F2E818.1FC4A30B@edu.uni-klu.ac.at>
Message-ID: <34F2E22C.D06BCA45@infinet.com>
Juergen Modre wrote:
> David Megginson wrote:
> > After considering the various discussions over the past few weeks, I
> > propose that we make the following changes:
> >
> > 1) Add a parse() method that accepts a stream.
> Fully agree.
>
> > 2) Add a parse() method that accepts a character buffer.
> I have similar thoughts like James and therefore don't really see the need for it.
> For the case to parse parts from an larger document the char[] can always be
> converted to an InputStream to be used with 1).
> But maybe your intention goes into another direction.
One way to get around the char[] array problem is to sort of have a feeder mechanism in
which you continually feed the parser a set of bytes like in the case of an input stream
except that you explicitly turn the parser on before feeding that parser the data and
explicitly turn the parser off when you are done feeding it.
For example you could have methods that looked like this:
Parser.start();
Parser.parseBuffer(char[] c);
Parser.end();
Then you could just go through a loop and feed in a character array you populate with the
document data until you are finished. This of course would be much more straightforward
with an input stream, however this would get around the problem of languages which have no
concept of input streams.
The biggest problem I see with this suggestion is that it will make writing parsers a bit
more difficult to implement since you have to essentially freeze your parser's state after
each call to parseBuffer() finishes.
Just a suggestion,
Tyler
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From drewn at icomm.co.uk Tue Feb 24 15:08:01 1998
From: drewn at icomm.co.uk (Nick Drew)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
Message-ID: <01BD4136.48D7E5F0@krusty.icomm.co.uk>
<..stuff deleted...>
During the last round, most people vetoed this idea. Here it is
again, though, for your consideration:
package org.xml.sax;
import java.io.InputStream;
public interface Parser {
public void addEntityHandler (EntityHandler handler);
public void removeEntityHandler (EntityHandler handler);
public void addDocumentHandler (DocumentHandler handler);
public void removeDocumentHandler (DocumentHandler handler);
public void addErrorHandler (ErrorHandler handler);
public void removeErrorHandler (ErrorHandler handler);
public void parse (String publicId, String systemId)
throws java.lang.Exception;
public void parse (String publicId, String systemId,
InputStream inputStream)
throws java.lang.Exception;
}
Any further thoughts on this issue?
Apologies in advance: I'm quite new to the list, so missed this discussion first time around.
It seems that the above suggestion isn't essential. Perhaps there should be a standardised MulticastEntityHandler, MulticastDocumentHandler, and MulticastErrorHandler, which can be used instead, e.g.
{
...
MulticastDocumentHandler mdocHandler = new MyMulticastDocumentHandler();
mdocHandler.addHandler( new ExistingDocumentHandler() );
mdocHandler.addHandler( new AnotherExistingDocumentHandler() );
...
iParser.setEntityHandler( mdocHandler );
...
}
and the MulticastDocumentHandler just delegates to its members as needed.
Nick Drew
icomm technologies ltd.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 24 18:52:50 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
In-Reply-To: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk>
References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <199802241851.NAA00358@unready.microstar.com>
Michael Kay writes:
> >In a private message, one SAX user raised the issue again of multiple
> >handlers
> >Any further thoughts on this issue?
> >
> I've implemented a layer on top of SAX that provides not only multiple
> handlers, but also per-element-type handlers. Since it is trivial to
> implement this on top of SAX, I suggest it shouldn't go into SAX itself.
I had this same thought when I was walking my girls to school after
lunch. Unlike a GUI, which spends most of its time waiting for the
user to do something interesting, an XML parser has to deal with
hundreds or thousands of events each second, and perhaps millions of
events in a hefty XML document.
Upon reflection, I am becoming more inclined to agree with the
arguments that people made in the first round, that the overhead of
walking through a vector of handlers and delivering each event to each
one can be excessive. Besides, as Michael rightly points out,
implementing a multi-listener interface on top of SAX is trivial if
you really need it.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Feb 24 19:11:29 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
In-Reply-To: <34F2EF37.8979C8DE@edu.uni-klu.ac.at>
References: <002401bd40e9$fde8c510$2ee044c6@donpark>
<199802241346.IAA00395@unready.microstar.com>
<34F2EF37.8979C8DE@edu.uni-klu.ac.at>
Message-ID: <199802241910.OAA00445@unready.microstar.com>
Juergen Modre writes:
> > In an ideal world, we'd also have some kind of ability to ask to
> > parser to turn validation on or off, but I'm not certain that that's
> > practical: any thoughts?
> I thinks that is practical and necessary.
>
> One solution would be to have methods like:
> void setValidation(boolean validation)
> boolean getValidation()
>
> These methods can be called before starting to parse with
> the parse() method.
It's trickier than this -- for example, we'd probably have to create
an exception that is thrown if the underlying parser does not support
validation; furthermore, none of the parsers that I've looked at
supports a toggle like this, and we will be forcing another design
decision on them if we require this toggle.
> I also think a parse method with an systemId only as parameter would be
> convenient. (With targeting to users rather new to XML
> and not very used to the publicId's).
>
> public abstract void parse (String systemId)
>
> This would also avoid the need to call every time
> entityHandler.resolveEntity() to resolve the Entity.
It might be simpler, though I'm trying to keep the number of methods
to a minimum. It wouldn't affect EntityHandler.resolveEntity(),
though, since that does not exist solely for the sake of handling
public identifiers.
Thanks, and all the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gmckenzi at JetForm.com Tue Feb 24 19:28:09 1998
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
Message-ID:
David,
Something just occurred to me...and maybe its too late, but I thought
I'd mention it...
With SAX there is an assumption that the whole file will be parsed. I'm
stuck if I'm parsing a 1 gigabyte file that contains 50,000
elements (representing transactions of data), and I only
want the first transaction.
Would it be possible for a mechanism that could pause/resume/terminate a
parse? Maybe a callback that returns either a 'continue', 'pause' or
'terminate' status value, and a resumeParse() method? Or a method that
I can call from within the callback to pause the parsing.
I know that I could throw an exception from within one of my callbacks,
which will halt the parse...but it would be valuable to be able 'pause'
and 'resume'.
Gavin.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Tue Feb 24 20:09:40 1998
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk> <199802241851.NAA00358@unready.microstar.com>
Message-ID: <34F3295B.26C6F728@infinet.com>
David Megginson wrote:
> Michael Kay writes:
>
> > >In a private message, one SAX user raised the issue again of multiple
> > >handlers
> > >Any further thoughts on this issue?
> > >
> > I've implemented a layer on top of SAX that provides not only multiple
> > handlers, but also per-element-type handlers. Since it is trivial to
> > implement this on top of SAX, I suggest it shouldn't go into SAX itself.
>
> I had this same thought when I was walking my girls to school after
> lunch. Unlike a GUI, which spends most of its time waiting for the
> user to do something interesting, an XML parser has to deal with
> hundreds or thousands of events each second, and perhaps millions of
> events in a hefty XML document.
>
> Upon reflection, I am becoming more inclined to agree with the
> arguments that people made in the first round, that the overhead of
> walking through a vector of handlers and delivering each event to each
> one can be excessive. Besides, as Michael rightly points out,
> implementing a multi-listener interface on top of SAX is trivial if
> you really need it.
You don't need to actually use a Vector, but you could instead use an array or
just a single object if the Vector was of length one. You may initially use a
Vector to store your the handlers, but when you are about to parse you could just
turn this into an array of handlers or else just a single handler. There are a
lot of ways to go about this so any performance loss would be a function of how
many handlers you are using. Nevertheless, SAX could just have a standard
MulticastHandler implementation that dispatches events to multiple handlers. I
think it would be useful to include in the Java SAX distribution a generic class
to do this sort of thing.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From wilfr at mail.bc.rogers.wave.ca Tue Feb 24 21:55:32 1998
From: wilfr at mail.bc.rogers.wave.ca (Wilf Reedijk)
Date: Mon Jun 7 17:00:12 2004
Subject: Modifying DTD using msxml
Message-ID: <34F34236.7FD472ED@rogers.wave.ca>
I would like to update the (internal) DTD for a document using msxml.
I am converting the DTD to a schema using the dtd.getSchema() method
I then modify the elements within the schema using addChild etc.
My question is: How do convert this schema back to the DOM so that it is
saved when the document is saved.
Thanks
Wilf Reedijk
wilfr@rogers.wave.ca
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Tue Feb 24 21:58:55 1998
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 17:00:12 2004
Subject: Modifying DTD using msxml
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C01906CAA@red-msg-56.dns.microsoft.com>
I assume you want to convert it back to the DTD syntax - you will have to do
this yourself. MSXML doesn't have this feature yet.
> -----Original Message-----
> From: Wilf Reedijk [SMTP:wilfr@mail.bc.rogers.wave.ca]
> Sent: Tuesday, February 24, 1998 1:57 PM
> To: xmldev
> Subject: Modifying DTD using msxml
>
> I would like to update the (internal) DTD for a document using msxml.
>
> I am converting the DTD to a schema using the dtd.getSchema() method
>
> I then modify the elements within the schema using addChild etc.
>
> My question is: How do convert this schema back to the DOM so that it is
> saved when the document is saved.
>
>
> Thanks
> Wilf Reedijk
> wilfr@rogers.wave.ca
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jmodre at edu.uni-klu.ac.at Tue Feb 24 22:36:01 1998
From: jmodre at edu.uni-klu.ac.at (Juergen Modre)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <002401bd40e9$fde8c510$2ee044c6@donpark>
<199802241346.IAA00395@unready.microstar.com>
<34F2EF37.8979C8DE@edu.uni-klu.ac.at> <199802241910.OAA00445@unready.microstar.com>
Message-ID: <34F359AB.99DB806D@edu.uni-klu.ac.at>
David Megginson wrote:
>
> Juergen Modre writes:
>
> > > In an ideal world, we'd also have some kind of ability to ask to
> > > parser to turn validation on or off, but I'm not certain that that's
> > > practical: any thoughts?
> > I thinks that is practical and necessary.
> >
> > One solution would be to have methods like:
> > void setValidation(boolean validation)
> > boolean getValidation()
> >
> > These methods can be called before starting to parse with
> > the parse() method.
>
> It's trickier than this -- for example, we'd probably have to create
> an exception that is thrown if the underlying parser does not support
> validation;
Correct. My example was just a first naive try.
> furthermore, none of the parsers that I've looked at
> supports a toggle like this, and we will be forcing another design
> decision on them if we require this toggle.
There are already XML parsers allowing this toggle.
For instance DXP has this capability.
I think it would be good to have methods that allow to
set a parser into well-formedness or validation mode.
> > I also think a parse method with an systemId only as parameter would be
> > convenient. (With targeting to users rather new to XML
> > and not very used to the publicId's).
> >
> > public abstract void parse (String systemId)
> >
> > This would also avoid the need to call every time
> > entityHandler.resolveEntity() to resolve the Entity.
>
> It might be simpler, though I'm trying to keep the number of methods
> to a minimum.
Okay.
All the best
Juergen
-----------------------------------------------
JUERGEN MODRE
Reisdorf 6
A-9371 Brueckl
Austria (Europe)
Phone: ++43 4214 2320
Mobile: ++43 664 233 22 22
E-mail: jmodre@edu.uni-klu.ac.at
WWW: http://www.edu.uni-klu.ac.at/~jmodre
-----------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From b.laforge at opengroup.org Tue Feb 24 23:01:43 1998
From: b.laforge at opengroup.org (Bill la Forge)
Date: Mon Jun 7 17:00:12 2004
Subject: axtp zip available
Message-ID: <3.0.32.19980224180619.00922bf0@postman.osf.org>
I've had several requests to create a zip file for axtp.
I've done so. See http://www.camb.opengroup.org/~laforge/axtp/#related_links
(I've also cleaned up the relationship between the parsed xml object tree
and the application peer objects.)
And yes, I'm only using a subset of xml. But I think packet size is a big
issue here.
This has become strictly a spare-time project, and I still need to develop
the client and server api's before it can live up to the "easy to use" claim.
Perhaps this weekend...
Meanwhile, please keep those comments coming.
b)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Feb 24 23:30:41 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
In-Reply-To: <199802241851.NAA00358@unready.microstar.com>
References: <01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk>
<01bd4135$5b2893e0$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <3.0.1.16.19980224215546.35877758@pop3.demon.co.uk>
At 13:51 24/02/98 -0500, David Megginson wrote:
> Besides, as Michael rightly points out,
>implementing a multi-listener interface on top of SAX is trivial if
>you really need it.
>
As it's trivial, it would be a great help if a specimen were included in
SAX that those of us who are per-element people could use. Seriously, I'm
not quite sure what it would look like but I am sure I would recognise it
when I saw it :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Feb 24 23:51:01 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
Message-ID: <000d01bd417e$521bcbc0$2ee044c6@donpark>
>As it's trivial, it would be a great help if a specimen were included in
>SAX that those of us who are per-element people could use. Seriously, I'm
>not quite sure what it would look like but I am sure I would recognise it
>when I saw it :-)
This brings up the issue I wanted to bring up for a while:
"Should we add helper classes to SAX?"
HandlerBase sort of qualifies as a helper class but I think SAX should have
a lot more helper classes to help out SAX programmers. For example, a
'pass-through' DocumentHandler that filters out whitespace would be a great
help. An abstract implementation of DocumentHandler that takes maintains a
stack of ancestor elements would also be nice. A special trigger like
DocumentHandler that will return specified patterns (i.e. XSL rule like
pattern).
I think we have four choices at this point:
1. Leave SAX alone!
2. Add some but as little as possible.
3. Go nuts and let SAX bloat as the months go by.
4. Start EZ-SAX (sorry, I couln't help it. David picked a name ready-made
for puns) package to complement SAX.
Personally, I am all for EZ-SAX ;-p.
Regards,
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Wed Feb 25 00:30:26 1998
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 17:00:12 2004
Subject: The XML spec in XML: missing tags
In-Reply-To: <98Feb23.120356est.18826@thicket.arbortext.com>
Message-ID: <3.0.5.32.19980224192743.00a0d220@village.doctools.com>
As the maintainer of the specification DTD, let me say thanks for your
comments.
At 11:49 AM 2/23/98 -0500, Michael Kay wrote:
...
>Some comments on the XML tagging in the BNF rules:
>- it is useful to have the non-terminals tagged, though the way in which it
>done is a little clumsy, since the internal identifier and the visible name
>of the non-terminal are necessarily in a one-to-one correspondence. The way
>it is done seems designed primarily to enable a particular translation to
>HTML.
Are you saying that it's clumsy because the element content is duplicated
in the attribute value? Since the XML is transformed into HTML, it would
actually have been easier to let the content serve as the address (and be
stuffed into both the final element content and its href attribute,
with "#" and "-nt" tacked on). Alternatively, the element could have been
empty, and its attribute value both used as an address and rendered (with
some transformation that probably isn't worth doing...). Either way,
nothing would be duplicated in the source. However, it would make me a
little uncomfortable treating the same string as having two functions.
>- it is a shame that there is no tagging to distinguish terminal symbols
>from metasymbols, since this would enable nicer renditions of the rules,
>e.g. exploiting colour, without having to parse the BNF
I'll take this up with the other editors using the DTD.
>- it would seem more logical for each rule to have a single , with any
> and constraints being embedded within the , rather than
>these being separate elements interspersed among multiple elements.
We had a lengthy discussion of whether our production markup should be more
semantic and less presentational. It's so much work to make the markup
simulate the EBNF and to make the filters handle this, that we decided not
to go further in that direction. I do agree that the production markup is
less than "pure" in this area.
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 25 00:56:20 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: multiple handlers
In-Reply-To: <000d01bd417e$521bcbc0$2ee044c6@donpark>
References: <000d01bd417e$521bcbc0$2ee044c6@donpark>
Message-ID: <199802250054.TAA00347@unready.microstar.com>
Don Park writes:
> I think we have four choices at this point:
>
> 1. Leave SAX alone!
> 2. Add some but as little as possible.
> 3. Go nuts and let SAX bloat as the months go by.
> 4. Start EZ-SAX (sorry, I couln't help it. David picked a name ready-made
> for puns) package to complement SAX.
>
> Personally, I am all for EZ-SAX ;-p.
I think that it will be a wonderful idea for people to implement
higher-level, programmer-friendly stuff on top of SAX. Exactly what
_is_ programmer friendly will depend on the programming language, so I
agree that the helper classes should stay out of the SAX core, but I
encourage any efforts to make SAX programmers' lives easier (as Don
has done with SAXDOM).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 25 01:28:02 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: org.xml.sax.AttributeMap
Message-ID: <199802250126.UAA00473@unready.microstar.com>
We may as well take up the most difficult interface next, to get it
over with. Here's what we have right now for attributes, which are by
far the most vexed problem in SAX:
package org.xml.sax;
import java.util.Enumeration;
public interface AttributeMap {
public Enumeration getAttributeNames ();
public String getValue (String attributeName);
public boolean isEntity (String attributeName);
public boolean isNotation (String attributeName);
public boolean isId (String attributeName);
public boolean isIdref (String attributeName);
public String getEntityPublicID (String attributeName);
public String getEntitySystemID (String attributeName);
public String getNotationName (String attributeName);
public String getNotationPublicID (String attributeName);
public String getNotationSystemID (String attributeName);
}
BOY, DO I WANT TO CHANGE THIS ONE. James has made some good
suggestions about how to make this simpler and more efficient by
working from list indexes (it also avoids the need to allocate an
Enumeration). Here's what I want to change:
1. Rename the interface to org.xml.sax.AttributeList to reflect the
new approach.
2. Add a method to return the length of the list.
3. Look up attribute information based on integer indices rather than
string values.
4. Eliminate the is*() methods, and add a single method to return the
attribute's type as a string instead.
5. Rename getNotationName() to getEntityNotationName() to make its
role clearer.
With these changes, we end up with the following, somewhat simpler
interface:
package org.xml.sax;
public interface AttributeList {
public abstract int getLength ();
public abstract int getName (int index);
public abstract int getValue (int index);
public abstract String getType (int index);
public abstract String getEntityNotationName (int index);
public abstract String getEntityPublicId (int index);
public abstract String getEntitySystemId (int index);
public abstract String getNotationPublicId (int index);
public abstract String getNotationSystemId (int index);
}
The first four methods are actually very nice now (thanks, James, for
the suggestion). As specified in the XML REC, getType() will return
"CDATA" if there is no explicit declaration, and it will return the
declared attribute type otherwise. There's also no further dependency
on the Java-specific Enumeration class, so C++ programmers can sigh a
sigh of relief.
The last five methods are much more of a problem, and I'm still
agonizing over what to do. Why do we have binary entities in XML at
all? Is anyone going to use them, or will everything be done with
href's?
Attributes are the _only_ way to get at binary entities in XML, so if
I don't provide some way to get access to them here, then SAX parsers
and applications make it impossible to use binary (NDATA) entities at
all. I am very reluctant to create a new class or interface just for
entities (and yet another for notations), when other types of objects
do not have their own classes, and I certainly don't want to re-invent
(or pre-invent) the DOM.
HELP!!!
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 25 01:39:56 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:12 2004
Subject: SAX: org.xml.sax.AttributeMap
In-Reply-To: <199802250126.UAA00473@unready.microstar.com>
References: <199802250126.UAA00473@unready.microstar.com>
Message-ID: <199802250138.UAA00524@unready.microstar.com>
David Megginson writes:
> package org.xml.sax;
>
> public interface AttributeList {
>
> public abstract int getLength ();
> public abstract int getName (int index);
> public abstract int getValue (int index);
> public abstract String getType (int index);
>
> public abstract String getEntityNotationName (int index);
> public abstract String getEntityPublicId (int index);
> public abstract String getEntitySystemId (int index);
> public abstract String getNotationPublicId (int index);
> public abstract String getNotationSystemId (int index);
>
> }
For any of you who are wondering when attribute names and values
became integers, the above should have been
package org.xml.sax;
public interface AttributeList {
public abstract int getLength ();
public abstract String getName (int index);
public abstract String getValue (int index);
public abstract String getType (int index);
public abstract String getEntityNotationName (int index);
public abstract String getEntityPublicId (int index);
public abstract String getEntitySystemId (int index);
public abstract String getNotationPublicId (int index);
public abstract String getNotationSystemId (int index);
}
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Feb 25 02:00:39 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:13 2004
Subject: org.xml.sax.AttributeMap
Message-ID: <001301bd4190$726aa650$2ee044c6@donpark>
David,
>The last five methods are much more of a problem, and I'm still
>agonizing over what to do. Why do we have binary entities in XML at
>all? Is anyone going to use them, or will everything be done with
>href's?
>
>Attributes are the _only_ way to get at binary entities in XML, so if
>I don't provide some way to get access to them here, then SAX parsers
>and applications make it impossible to use binary (NDATA) entities at
>all. I am very reluctant to create a new class or interface just for
>entities (and yet another for notations), when other types of objects
>do not have their own classes, and I certainly don't want to re-invent
>(or pre-invent) the DOM.
How about replacing the five with following method and three constants?
public static final int NAME = 0;
public static final int PUBLIC_ID = 1;
public static final int SYSTEM_ID = 2;
public abstract String[] getDataInfo (int index);
Since AttributeList is valid only within startElement method, you can reuse
a single string array rather allocate a new one per getEntityInfo method.
If the method returns null, then it is attribute has no info.
If you haven't guessed by now, the constants above are used to index into
the returned array. Implementations should take steps to make sure the size
of the returned array is 3 and stuff null for NAME if it is not a notation.
Does this help?
Don Park
http://www.quake.net/~donpark/index.html
>
>
>HELP!!!
>
>
>David
>
>--
>David Megginson ak117@freenet.carleton.ca
>Microstar Software Ltd. dmeggins@microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From antony at n-space.com.au Wed Feb 25 02:24:42 1998
From: antony at n-space.com.au (Antony Blakey)
Date: Mon Jun 7 17:00:13 2004
Subject: org.xml.sax.AttributeMap
References: <001301bd4190$726aa650$2ee044c6@donpark>
Message-ID: <34F37FDC.D945B484@n-space.com.au>
Don Park wrote:
> How about replacing the five with following method and three constants?
>
> public static final int NAME = 0;
> public static final int PUBLIC_ID = 1;
> public static final int SYSTEM_ID = 2;
>
> public abstract String[] getDataInfo (int index);
>
> Since AttributeList is valid only within startElement method, you can reuse
> a single string array rather allocate a new one per getEntityInfo method.
> If the method returns null, then it is attribute has no info.
>
> If you haven't guessed by now, the constants above are used to index into
> the returned array. Implementations should take steps to make sure the size
> of the returned array is 3 and stuff null for NAME if it is not a notation.
Why would you not simply return a strongly typed data item (ignoring the
names)
public abstract DataInfo getDataInfo(int index);
public interface EntityInfo {
public String getName();
public String getPublicID();
Public String getSystemID();
}
As far as reuse of values is concerned however, I think this is a very
bad idea: startElement defines a new context, so reusing the parameters
to that call is workable, however reusing the result from the
getDataInfo call is a different kettle of fish. It would be better (if
you are so concerned) to keep a pool that you return so that they are
not reused within the context of a startElement call. This may seem like
more work on the part of the parser implementor, but you shouldn't push
this complexity onto the users of the parser when you can safely hide it
within the parser. The parser writer can make the effort for
efficiencies sake.
+----------------------------------+
| Antony Blakey |
| N-Space Pty Ltd |
| Java - CORBA - SGML - XML |
| mailto:antony@n-space.com.au |
| http://www.n-space.com.au |
+----------------------------------+
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Wed Feb 25 02:46:47 1998
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: org.xml.sax.AttributeMap
References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com>
Message-ID: <34F38506.3D19B68A@jclark.com>
> package org.xml.sax;
>
> public interface AttributeList {
>
> public abstract int getLength ();
> public abstract String getName (int index);
> public abstract String getValue (int index);
I think it's also desirable to provide a method to access attribute
values by name. Some applications only want to access attribute values
this way, and it's inconvenient and inefficient for the application to
have to iterate over all the names itself.
> public abstract String getType (int index);
I like this.
> public abstract String getEntityNotationName (int index);
> public abstract String getEntityPublicId (int index);
> public abstract String getEntitySystemId (int index);
> public abstract String getNotationPublicId (int index);
> public abstract String getNotationSystemId (int index);
I agree that SAX ought to provide access to unparsed entities but I
don't think this is the right way to achieve it. For a start, I can
have an ENTITIES attribute, so all these methods would need two
arguments (the index of the attribute in the attribute list, and the
index of the token in the value).
Another problem is that it is common to declare unparsed entities in the
internal subset, but to declare attribute types in an external DTD, eg
]>
where doc.dtd contains
Now if I parse this without processing the external DTD, the SAX
interface as I understand it won't allow be to get at the system and
public id for foo, although an application might well intrinsically know
that ref is an ENTITY attribute.
I think a better approach is for the processor at the end of the prolog
to pass an object to the application that provides information about all
the declared notations and unparsed entities.
XP has a DTD object that does this, but it might be better to call it
something else (like UnparsedEntitySet) since SAX might someday be
extended to provide full DTD access.
Note that if you provide access to the system ID, you have to deal with
the issue of relative URLs. Either the processor has to resolve a
relative URL into an absolute URL before passing to the application, or
it ha to make available a base URL to the application.
Here's what XP's DTD interface looks like (it's a little fancier than
what's I think is needed for SAX in that it provides access to all
general entities not just unparsed ones):
package com.jclark.xml.parse;
import java.util.Enumeration;
import java.net.URL;
/**
* Information about a DTD.
* @version $Revision: 1.4 $ $Date: 1998/02/17 04:20:20 $
*/
public interface DTD {
/**
* Returns an enumeration over the names of general entities declared
in
* the DTD.
*/
Enumeration entityNames();
/**
* Returns an enumeration over the names of notations declared in
* the DTD.
*/
Enumeration notationNames();
/**
* Returns the system identifier for a notation.
* Returns null if the notation was not declared or no system
identifier
* was specified.
* A relative URL is not automatically resolved into an absolute URL;
* getNotationBase
can be used to do this.
*
* @see #getNotationBase
*/
String getNotationSystemId(String notationName);
/**
* Returns the public identifier for a notation.
* Returns null if the notation was not declared or no public
identifier
* was specified.
*/
String getNotationPublicId(String notationName);
/**
* Returns the URL of the entity in which the notation was declared.
* Returns null if the entity was not declared or the URL of the
* declaring entity is not available.
*/
URL getNotationBase(String notationName);
/**
* Returns the replacement text of the specified general entity.
* Returns null if the entity was not declared or was
* as an external entity.
*/
String getEntityReplacementText(String entityName);
/**
* Returns the system identifier for a general entity.
* Returns null if the entity was not declared or is an internal
entity.
* A relative URL is not automatically resolved into an absolute URL;
* getNotationBase
can be used to do this.
*
* @see #getEntityBase
*/
String getEntitySystemId(String entityName);
/**
* Returns the public identifier for a general entity.
* Returns null if the entity was not declared or no public identifier
* was specified.
*/
String getEntityPublicId(String entityName);
/**
* Returns the name of the notation of an unparsed general entity.
* Returns null if the entity was not declared or was a parsed entity.
*/
String getEntityNotationName(String entityName);
/**
* Returns the URL of the entity in which the general entity was
declared.
* Returns null if the entity was not declared or the URL of the
* declaring entity is not available.
*/
URL getEntityBase(String entityName);
/**
* Returns true if an element type was declared to have element
content.
*/
boolean getElementTypeElementContent(String elementTypeName);
/**
* Returns true if the complete DTD was processed.
*/
boolean isComplete();
/**
* Returns true if standalone="yes"
was specified in the
* XML declaration.
*/
boolean isStandalone();
}
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Wed Feb 25 03:04:23 1998
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <199802230313.WAA00386@unready.microstar.com>
Message-ID: <34F38943.551F05AB@jclark.com>
> public void parse (InputStream is, String baseURI)
> throws java.lang.Exception;
XML allows the encoding of an entity being specified by an external
transport protocol (see 4.3.3): for example, when an XML document
arrives over HTTP with a content type of text/xml, then the encoding
specified in the charset parameter is supposed to take precedence over
that specified in the document entity by the encoding declaration or by
XML's default rules. So I think we need an additional argument here: a
String specifying the name of the encoding to be used for the
InputStream, or null if the encoding specified in the document entity
should be used.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Feb 25 07:21:38 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:13 2004
Subject: org.xml.sax.AttributeMap
Message-ID: <002301bd41bd$49737650$2ee044c6@donpark>
>Why would you not simply return a strongly typed data item (ignoring the
>names)
Because we are trying to minimize the number of classes to the bare minimum.
I don't feel too strongly about the goal but I felt I should make a
suggestion.
>As far as reuse of values is concerned however, I think this is a very
>bad idea: startElement defines a new context, so reusing the parameters
>to that call is workable, however reusing the result from the
>getDataInfo call is a different kettle of fish. It would be better (if
>you are so concerned) to keep a pool that you return so that they are
>not reused within the context of a startElement call. This may seem like
>more work on the part of the parser implementor, but you shouldn't push
>this complexity onto the users of the parser when you can safely hide it
>within the parser. The parser writer can make the effort for
>efficiencies sake.
What I suggested is not any worse than AttributeMap being reused by some of
the parsers since the returned value's lifetime is entirely bound by
lifetime of AttributeMap. Note that AttributeMap's Enumeration is also
invalid once startElement returns. But then I am not at all saying that
what I suggest is good.
One of the problem facing SAX is its speed. There are far too much objects
(mainly Strings) being instantiated unnecessarily because of multiple layers
involved. One of the users of SAXDOM measured performance at three levels
(SAX, SAXDOM, and his own application on top of SAXDOM) and found that
performance decreased by about 50% at each level. Processing of a 1.5 meg
XML file took 8 seconds at SAX level, 14 seconds at SAXDOM, and 35 seconds
at the application level. I don't know which SAX parser was used.
Since I have a particular interest in server-side XML processing, I have a
real concern about performance. I am currently feeling out the issues on
building a 'pedal-to-the-metal' XML parser with native SAX support.
Actually, I am finding that my performance goals can not be met with current
SAX API because I must cut down object instantiation down to bare minimum,
remove most synchronization, and cluster each stage to allow JIT more
effective use of CPU code cache.
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 09:01:06 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: multiple handlers
In-Reply-To: <199802250054.TAA00347@unready.microstar.com>
References: <000d01bd417e$521bcbc0$2ee044c6@donpark>
<000d01bd417e$521bcbc0$2ee044c6@donpark>
Message-ID: <3.0.1.16.19980225085026.357795b8@pop3.demon.co.uk>
At 19:54 24/02/98 -0500, David Megginson wrote:
[...]
>
>I think that it will be a wonderful idea for people to implement
>higher-level, programmer-friendly stuff on top of SAX. Exactly what
>_is_ programmer friendly will depend on the programming language, so I
>agree that the helper classes should stay out of the SAX core, but I
>encourage any efforts to make SAX programmers' lives easier (as Don
>has done with SAXDOM).
>
Although it may not formally be part of SAX, I think it will be extremely
valuable to have reference library implementations of parts of the spec.
For example, what is a valid Name in XML? You have to treat a large number
of special cases for characters, and are extremely vulnerable to revisions
of the spec (this is an area where I am sure minor revisions will happen).
So a set of library classes of the type:
public static boolean isValidName(String name);
public static String getCaseSpaceNormalizedAttval(String value);
would be extremely valuable. We can then delegate part of the prose to
these implementations.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 09:19:40 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: The XML spec in XML: missing tags
In-Reply-To: <3.0.5.32.19980224192743.00a0d220@village.doctools.com>
References: <98Feb23.120356est.18826@thicket.arbortext.com>
Message-ID: <3.0.1.16.19980225083228.2a27adb6@pop3.demon.co.uk>
[... I may have missed the postings quoted in this...]
At 19:27 24/02/98 -0500, Eve L. Maler wrote:
>As the maintainer of the specification DTD, let me say thanks for your
>comments.
We are very grateful to Eve for having produced the markup specification.
Unfortunately she is a victim of her success in that rec.xml [my shorthand
for the spec] is the first 'really crunchy official piece of XML' that we
can get to grips with for learning and developing our tools. This is why a
DTD and its associated semantics/documentation is so important :-). [I
would also expect that 'spec.dtd' might be re-usable in other contexts.]
>
[...]
>
>We had a lengthy discussion of whether our production markup should be more
>semantic and less presentational. It's so much work to make the markup
>simulate the EBNF and to make the filters handle this, that we decided not
>to go further in that direction. I do agree that the production markup is
>less than "pure" in this area.
>
My interest is similar - but complementary - to Michael's; I am interested
in the terminology. Thus I want to be able to abstract the terms [there are
62 termdefs] in the document and produce a model for their structure (e.g.
entailment by containment, by linking and so on.) In this way I can create
a graphical interactive map of the concepts in the XML spec and have
already created a prototype. I would like to know, for example, whether all
terms are defined by or whether there are some which are simply
defined by foo bar. There appears to be some duplication here
as well; thus a termdef has an attribute naming the term, but it is also
often contained within a later in the 'description'. [And there is
at least one case where occurs in mid-sentence - I suspect this
isn't intended.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 09:20:45 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: org.xml.sax.AttributeMap
In-Reply-To: <34F38506.3D19B68A@jclark.com>
References: <199802250126.UAA00473@unready.microstar.com>
<199802250138.UAA00524@unready.microstar.com>
Message-ID: <3.0.1.16.19980225084239.3577ae48@pop3.demon.co.uk>
At 09:42 25/02/98 +0700, James Clark wrote:
>
>> public abstract String getType (int index);
>
>I like this.
>
So do I. As XML grows larger and acquires more extensions (XLL, XSL, etc.)
there will be an increasing number of 'hardcoded' attribute types and
values. For example, the type of HREF/href is effectively determined as
CDATA (it would be perverse to make it ID, for example, even if not in
xml-link context) and xml:lang is required (I think) to be NMTOKEN or
NMTOKENS. Hardcoding all these 'special cases' is a pain and SAX (or DOM)
can help with implementing the prose in the specs.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Wed Feb 25 10:43:46 1998
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: org.xml.sax.AttributeMap
In-Reply-To: David Megginson's message of "Tue, 24 Feb 1998 20:38:34 -0500"
References: <199802250126.UAA00473@unready.microstar.com> <199802250138.UAA00524@unready.microstar.com>
Message-ID:
David> David Megginson
=> In article <199802250138.UAA00524@unready.microstar.com>, David
=> wrote:
David> David Megginson writes:
David> package org.xml.sax;
David>
David> public interface AttributeList {
David>
David> //...
David> public abstract String getType (int index);
David> //...
David>
David> }
We're returning one of a bounded, known set of values. I'd prefer to
use an int for this type of thing, along with a set of constants.
I.e.
public abstract String getType (int index);
public static final int CDATA = 0;
public static final int NMTOKEN = 1;
// etc.
The only advantage a String has over this is that you can meaningfully
present it to the user as it is. A disadvantage of String is that it is
computationally expensive to compare for equality (or equivalently, and
worse, to switch() on it). Comparison becomes easier if one provides a
set of String constants and guarantees that returned values will test
equal with "==". That is not too different to my suggestion of using
numeric constants.
Converting integers to human-readable Strings is easy:
public static String[] typeNames = new String[/* some size */];
static {
typeNames[CDATA] = "CDATA";
typeNames[NMTOKEN] = "NMTOKEN";
// etc.
}
but I don't think this needs to be part of the interface.
One might wish to use short or char instead of int if storage space is
at a premium; I'm making no judgement on which arithmetic type is
best.
This proposal is not Java-specific.
--
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Wed Feb 25 11:36:23 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:00:13 2004
Subject: helper classes for SAX
Message-ID: <01bd41e1$be824f60$1e09e391@mhklaptop.bra01.icl.co.uk>
>"Should we add helper classes to SAX?"
>
I have written a package on top of SAX which I hope to publish soon - I need
to get it past some corporate processes
I wrote it because I found I was doing the same thing repeatedly in a number
of SAX applications. I call the package SAXON (sorry), and it provides the
following services:
- allows you to register a handler for a particular element type (or a
particular element type in the context of a parent element type). The
handler can supply methods to process the element start or end, the
character data or ignorable white space in the element, or the start or end
of a consecutive group of one or more elements (cf. XSL)
- provides you with context information about the element; in particular,
its parent and ancestors, their attributes, and also their elder sibling
elements.
- allows you to associate user data with an element, so for example your
start-element method can pass data to the corresponding end-element method
- allows you to associate an output "bucket" with an element type, so that
all output for that element and its children (unless otherwise specified)
goes into that bucket. Useful for splitting documents and for limited
re-ordering of elements
- allows multiple handlers per element type
- includes some standard element handlers for doing HTML rendition, for
generating automatic numbering, etc
Although I'm not in a position to go public with it yet, I'll be happy to
share the current state of development with any individual who wants to
collaborate.
I do realise of course that some of these facilities can be achieved by
using the DOM instead of an event-based parser, and there is a world of
stuff in JUMBO that I haven't expored yet. I was trying to add value to SAX
without going heavyweight, which of course is a delicate line to tread.
Regards, Mike Kay
ICL
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Wed Feb 25 13:42:56 1998
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: org.xml.sax.AttributeMap
Message-ID: <199802251342.NAA04905@mail.iol.ie>
[Toby Speight]
>
>We're returning one of a bounded, known set of values. I'd prefer to
>use an int for this type of thing, along with a set of constants.
>I.e.
>
> public abstract String getType (int index);
> public static final int CDATA = 0;
> public static final int NMTOKEN = 1;
> // etc.
>
>The only advantage a String has over this is that you can meaningfully
>present it to the user as it is. A disadvantage of String is that it is
>computationally expensive to compare for equality (or equivalently, and
>worse, to switch() on it).
If ints are going to be used, lets use values that can
be bit-twiddled.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 25 14:12:10 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
In-Reply-To: <34F38943.551F05AB@jclark.com>
References: <199802230313.WAA00386@unready.microstar.com>
<34F38943.551F05AB@jclark.com>
Message-ID: <199802251410.JAA00633@unready.microstar.com>
James Clark writes:
> XML allows the encoding of an entity being specified by an external
> transport protocol (see 4.3.3): for example, when an XML document
> arrives over HTTP with a content type of text/xml, then the
> encoding specified in the charset parameter is supposed to take
> precedence over that specified in the document entity by the
> encoding declaration or by XML's default rules. So I think we need
> an additional argument here: a String specifying the name of the
> encoding to be used for the InputStream, or null if the encoding
> specified in the document entity should be used.
This is a very good point, as was the suggestion earlier (I don't
remember whose it was) that we rearrange arguments in order of
decreasing importance to the programmer. With those suggestions in
mind, here's my current take on org.xml.sax.Parser:
package org.xml.sax;
public interface Parser {
public abstract void setEntityHandler (EntityHandler handler);
public abstract void setDocumentHandler (DocumentHandler handler);
public abstract void setErrorHandler (ErrorHandler handler);
public abstract void parse (String systemId, String publicId)
throws java.lang.Exception;
public abstract void parse (InputStream input, String encoding,
String systemId, String publicId)
throws java.lang.Exception;
}
I haven't included a setValidate() method yet, partly because I'm not
certain what it would really mean. If I did
setValidate(false);
would that simply prevent the reporting of validation errors, or would
it also prohibit the parser from resolving external text entities and
the external DTD subset?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jmodre at edu.uni-klu.ac.at Wed Feb 25 15:08:56 1998
From: jmodre at edu.uni-klu.ac.at (Juergen Modre)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <199802230313.WAA00386@unready.microstar.com>
<34F38943.551F05AB@jclark.com> <199802251410.JAA00633@unready.microstar.com>
Message-ID: <34F44259.5887FF1F@edu.uni-klu.ac.at>
David Megginson wrote:
> This is a very good point, as was the suggestion earlier (I don't
> remember whose it was) that we rearrange arguments in order of
> decreasing importance to the programmer.
I think it was Don Park and I also like it.
> With those suggestions in
> mind, here's my current take on org.xml.sax.Parser:
>
> package org.xml.sax;
>
> public interface Parser {
>
> public abstract void setEntityHandler (EntityHandler handler);
> public abstract void setDocumentHandler (DocumentHandler handler);
> public abstract void setErrorHandler (ErrorHandler handler);
>
> public abstract void parse (String systemId, String publicId)
> throws java.lang.Exception;
>
> public abstract void parse (InputStream input, String encoding,
> String systemId, String publicId)
> throws java.lang.Exception;
>
> }
I think Don's suggestion was also to have it like
public abstract void parse (String systemId, String publicId, String encoding, InputStream
input)
so that the first parameter part is always the same.
So if another constructor will be added only the the last parameter will differ.
> I haven't included a setValidate() method yet, partly because I'm not
> certain what it would really mean. If I did
>
> setValidate(false);
>
> would that simply prevent the reporting of validation errors, or would
> it also prohibit the parser from resolving external text entities and
> the external DTD subset?
It should have the following meaning:
- setValidate(false);
That the document/stream should be parsed for well-formedness.
This should also be the default if nothing was set with the setValidate() method.
- setValidate(true);
That the document/stream should also be validated during parsing.
The question where there is exactly the border between well-formedness
parsing and validation parsing should be left to the parser. This border
can be found in the XML spec.
The SAX interface is/should be useable for both classes of XML parsers
and give also the possibility to enable/disable validation.
But I agree that it is sometimes not easy to see the clear border
between well-formedness parsing and validation parsing in the XML spec.
All the best
Juergen
-----------------------------------------------
JUERGEN MODRE
Reisdorf 6
A-9371 Brueckl
Austria (Europe)
Phone: ++43 4214 2320
Mobile: ++43 664 233 22 22
E-mail: jmodre@edu.uni-klu.ac.at
WWW: http://www.edu.uni-klu.ac.at/~jmodre
-----------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Wed Feb 25 15:51:03 1998
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 17:00:13 2004
Subject: The XML spec in XML: missing tags
In-Reply-To: <98Feb25.042025est.18818@thicket.arbortext.com>
References: <3.0.5.32.19980224192743.00a0d220@village.doctools.com>
<98Feb23.120356est.18826@thicket.arbortext.com>
Message-ID: <3.0.5.32.19980225104757.00a13120@village.doctools.com>
Oh, you want *documentation*, do you?? Well, the DTD was hard to write; it
should be hard to understand. :-)
Seriously, I keep saying that I'll release the reference documentation Real
Soon Now, and in fact I'm hoping to be able to spend a few hours tidying it
up and releasing it later this week. (There's also a minor DTD update in
the pipe.)
At 03:32 AM 2/25/98 -0500, Peter Murray-Rust wrote:
>My interest is similar - but complementary - to Michael's; I am interested
>in the terminology. Thus I want to be able to abstract the terms [there are
>62 termdefs] in the document and produce a model for their structure (e.g.
>entailment by containment, by linking and so on.) In this way I can create
>a graphical interactive map of the concepts in the XML spec and have
>already created a prototype. I would like to know, for example, whether all
>terms are defined by or whether there are some which are simply
>defined by foo bar. There appears to be some duplication here
>as well; thus a termdef has an attribute naming the term, but it is also
>often contained within a later in the 'description'. [And there is
>at least one case where occurs in mid-sentence - I suspect this
>isn't intended.]
is a really odd way to do term definitions, for my money, but
that's what the users wanted. :-) It captures an "inline" definition of a
term, and because of the mixed content model, it can't even ensure that a
is present to identify the actual term being defined. Likewise, it
can't ensure that the definition captured functions as a "standalone"
sentence or set of sentences. I suspect that the cut-off sentence was more
in the spirit of poetic license.
is occasionally used legitimately without a wrapper; it's
marking a term being used in a special way, without an accompanying
definition.
Gee, maybe I should just collect all the questions and do the documentation
as a Q&A...
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From matthewg at poet.de Wed Feb 25 16:01:23 1998
From: matthewg at poet.de (Matthew Gertner)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
Message-ID: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com>
>It should have the following meaning:
>- setValidate(false);
> That the document/stream should be parsed for well-formedness.
> This should also be the default if nothing was set with the setValidate()
method.
>
>- setValidate(true);
> That the document/stream should also be validated during parsing.
How about a 2x2 matrix?
With DTD
setValidate(false) - checks for well-formedness, external subset is used
for entity and notation declarations, etc.
setValidate(true) - full validation
Without DTD
setValidate(false) - just checks for well-formedness
setValidate(true) - throws an exception
Matthew
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 17:15:05 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: helper classes for SAX
In-Reply-To: <01bd41e1$be824f60$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <3.0.1.16.19980225161441.20a7fe70@pop3.demon.co.uk>
At 11:37 25/02/98 -0000, Michael Kay wrote:
>>"Should we add helper classes to SAX?"
>>
>I have written a package on top of SAX which I hope to publish soon - I need
>to get it past some corporate processes
I understand the problem :-)
>
>
>I wrote it because I found I was doing the same thing repeatedly in a number
>of SAX applications. I call the package SAXON (sorry), and it provides the
>following services:
>
>- allows you to register a handler for a particular element type (or a
>particular element type in the context of a parent element type). The
>handler can supply methods to process the element start or end, the
>character data or ignorable white space in the element, or the start or end
>of a consecutive group of one or more elements (cf. XSL)
>- provides you with context information about the element; in particular,
>its parent and ancestors, their attributes, and also their elder sibling
>elements.
This is useful. I found myself doing the same sort of thing. In a
tree-based situation it's easy - I use XLL XPtrs repeatedly. I missed these
when I came to implement some things on top of SAX.
>- allows you to associate user data with an element, so for example your
>start-element method can pass data to the corresponding end-element method
>- allows you to associate an output "bucket" with an element type, so that
>all output for that element and its children (unless otherwise specified)
>goes into that bucket. Useful for splitting documents and for limited
>re-ordering of elements
Yes. This is partly what my (very simple) SAXSplit does - splits documents
into smaller bits.
There was discussion at one stage that XML should have a transformation
language. Personally I would welcome this. XSL goes half the way in
providing a way of identifying components to be split, re-ordered,
transformed, etc. but concentrates on graphic rendering for humans.
>- allows multiple handlers per element type
>- includes some standard element handlers for doing HTML rendition, for
>generating automatic numbering, etc
I'd certainly like someone else to write code for HTML if that is what is
being offered :-)
>
>Although I'm not in a position to go public with it yet, I'll be happy to
>share the current state of development with any individual who wants to
>collaborate.
:-)
>
>I do realise of course that some of these facilities can be achieved by
>using the DOM instead of an event-based parser, and there is a world of
The attraction of SAX is that:
- it is simpler for XML newbies to understand
- you don't have to hold everything in memory
>stuff in JUMBO that I haven't expored yet. I was trying to add value to SAX
JUMBO mainly consists of large muddy footprints. Seriously, I would be
happy to lose any generic functionality from JUMBO if a better way arises.
For example, I use SAX+FOO as the parser and can see a move towards DOM for
defining the tree/grove components. When/if I'm happy to go to J1.1 I will
seriously consider the Swing JTree, though there are bits I find missing at
present.
I am not clear what other features are modular but I am sure many are.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Feb 25 17:26:06 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
In-Reply-To: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com>
References: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com>
Message-ID: <199802251724.MAA02583@unready.microstar.com>
Matthew Gertner writes:
> How about a 2x2 matrix?
>
> With DTD
> setValidate(false) - checks for well-formedness, external subset is used
> for entity and notation declarations, etc.
> setValidate(true) - full validation
>
> Without DTD
> setValidate(false) - just checks for well-formedness
> setValidate(true) - throws an exception
This comes back to the original problem, however: what if I want to
include the external subset and external text entities but don't want
to validate? I'm not sure that the two should be tied together
(AElfred, for example, does not validate, but it does use the DTD).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 18:47:36 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
In-Reply-To: <34F44259.5887FF1F@edu.uni-klu.ac.at>
References: <199802230313.WAA00386@unready.microstar.com>
<34F38943.551F05AB@jclark.com>
<199802251410.JAA00633@unready.microstar.com>
Message-ID: <3.0.1.16.19980225170648.0947fd3e@pop3.demon.co.uk>
At 16:10 25/02/98 +0000, Juergen Modre wrote:
[...]
>- setValidate(true);
> That the document/stream should also be validated during parsing.
>
>The question where there is exactly the border between well-formedness
>parsing and validation parsing should be left to the parser. This border
>can be found in the XML spec.
>The SAX interface is/should be useable for both classes of XML parsers
>and give also the possibility to enable/disable validation.
>
>
>But I agree that it is sometimes not easy to see the clear border
>between well-formedness parsing and validation parsing in the XML spec.
>
This is an area that I (and I think others) have difficulty with, although
I think there are many who are clear how different parsers behave. This
also interacts with the 'standalone' value in the xml PI. There is also
some potential confusion as to when and how the presence/absence of the
external subset makes a difference.
If my worries are unfounded, then it should be possible to create a precise
description of what parameters, files, internal subsets etc. and need to
control the behaviour of a SAX-compliant parser and what it should do. In
which case it would be very helpful to see it set out clearly and I'll shut
up. If, however, there still is confusion then we shall discover it in
these attempts :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Feb 25 19:28:20 1998
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 17:00:13 2004
Subject: The XML spec in XML: missing tags
In-Reply-To: <3.0.5.32.19980225104757.00a13120@village.doctools.com>
References: <98Feb25.042025est.18818@thicket.arbortext.com>
<3.0.5.32.19980224192743.00a0d220@village.doctools.com>
<98Feb23.120356est.18826@thicket.arbortext.com>
Message-ID: <3.0.1.16.19980225190358.0b6f0f44@pop3.demon.co.uk>
At 10:47 25/02/98 -0500, Eve L. Maler wrote:
>Oh, you want *documentation*, do you?? Well, the DTD was hard to write; it
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Not ME. I was weaned on 5-hole paper tape. Variables should be no longer
than 1 character.
>should be hard to understand. :-)
Yes. I strip comments from FORTRAN programs as it is good for the soul and
saves cards.
I must have dreamed it, but someone posted a month or two back that
documentation was a *required* part of a DTD :-)
>
>Seriously, I keep saying that I'll release the reference documentation Real
>Soon Now, and in fact I'm hoping to be able to spend a few hours tidying it
>up and releasing it later this week. (There's also a minor DTD update in
>the pipe.)
Great. Seriously - although it wasn't perhaps intended, rec.xml is a
splendid vehicle for people to cut their teeth on - it's got structure,
uses normalisation, has a good variety of elementTypes but also uses some
in a generic manner. The only thing it doesn't use is entities. I have
tweaked my SAXSplit jiffy to do produce entities for div1, etc.
And - an argument for preserving comments in document structure - there is
some splendid archaeology inside...
>
[...]
>
> is a really odd way to do term definitions, for my money, but
>that's what the users wanted. :-) It captures an "inline" definition of a
*Users*?? DTD by committee?? gulp.
>term, and because of the mixed content model, it can't even ensure that a
> is present to identify the actual term being defined. Likewise, it
>can't ensure that the definition captured functions as a "standalone"
>sentence or set of sentences. I suspect that the cut-off sentence was more
>in the spirit of poetic license.
Fair enough. The approach I am taking to terminology is based on MARTIF
(ISO12200 and ISO12620) - MARTIF itself having strong TEI roots. So I shall
use some simple heuristics to transform termdefs to my termEntry's
>
> is occasionally used legitimately without a wrapper; it's
>marking a term being used in a special way, without an accompanying
>definition.
Yes. I shall abstract these.
>
>Gee, maybe I should just collect all the questions and do the documentation
>as a Q&A...
Not a bad idea. I certainly don't want you to go to a lot of trouble. One
line sentences for each elementType are probably OK, plus any hardcoded
semantics (e.g. what the target of IDREFs may/maynot be. [I have a set of
simple tools in JUMBO that allow you to browse documents, so you find all
elementTypes, their allowed children, attributes, attribute values, etc.
and can then display the actual location in the document. You can then make
a pretty good guess at what they mean.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Fri Feb 27 00:01:13 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:00:13 2004
Subject: JFC 1.1 Released
Message-ID: <000401bd4312$190c22e0$2ee044c6@donpark>
This is a heads-up notice to those of us interested in Java.
JFC 1.1 was released today. It does not include Java2D nor Drag-n-Drop.
Metal L&F looks good but I was somewhat disappointed by lack of speed
improvements over the beta versions. There are still some update problems
and some of the features were maimed or shifted into preview status. It is
better than nothing.
The fact that JFC 1.1 has now been shipped means that JDK 1.2 beta 3 release
is not far behind since JFC 1.1 was supposed to ship at the same time.
I just thought you guys might be interested in the news,
Don Park
http://www.quake.net/~donpark/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From zwang at pstat.ucsb.edu Fri Feb 27 02:34:54 1998
From: zwang at pstat.ucsb.edu (Zheng Wang)
Date: Mon Jun 7 17:00:13 2004
Subject: JFC
In-Reply-To: <000201bd3608$c0c7f4d0$2ee044c6@donpark>
Message-ID:
I also tried the Swing1.0. It is still not compatible with JDK.
Does someone work with both JDK and Swing and know how to make them
compatible?
Thanks
Zheng Wang
Department of Statistics and Applied Probability
University of California, Santa Barbara
E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Fri Feb 27 03:29:11 1998
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: finalising org.sax.xml.Parser
References: <01bd4206$6ccb6d30$a00b0ac0@pharcyde.poetsoftware.xo.com> <199802251724.MAA02583@unready.microstar.com>
Message-ID: <34F6321E.8D415644@jclark.com>
David Megginson wrote:
>
> Matthew Gertner writes:
>
> > How about a 2x2 matrix?
> >
> > With DTD
> > setValidate(false) - checks for well-formedness, external subset is used
> > for entity and notation declarations, etc.
> > setValidate(true) - full validation
> >
> > Without DTD
> > setValidate(false) - just checks for well-formedness
> > setValidate(true) - throws an exception
>
> This comes back to the original problem, however: what if I want to
> include the external subset and external text entities but don't want
> to validate? I'm not sure that the two should be tied together
> (AElfred, for example, does not validate, but it does use the DTD).
The following seem the reasonable combinations to me:
- Validate and process all external entities (if you're validating
you've got to process all external entities).
- Don't validate and process external DTD and parameter entitities
depending on the setting of standalone.
- Don't validate and process external DTD and parameter entities
(irrespective of the setting of standalone).
- Don't validate and don't process external DTD and parameter entities
(irrespective of the setting of standalone).
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From moroz at paragraph.com Fri Feb 27 13:32:42 1998
From: moroz at paragraph.com (Moroz, Oleg)
Date: Mon Jun 7 17:00:13 2004
Subject: JFC
Message-ID: <00FE2F436493D111900E00A0C91003780C7C2F@ms.paragraph.com>
Zheng Wang[SMTP:zwang@pstat.ucsb.edu] wrote:
> I also tried the Swing1.0. It is still not compatible with JDK.
> Does someone work with both JDK and Swing and know how to make them
> compatible?
What do you mean by "not compatible with JDK" ? Swing 1.0 works perfectly
with JDK / JRE 1.1.5 for Win32 from Sun and I hope with the latest JDK 1.1.5
for Linux from Steve Byrne (will try that at home tonight). It also works
with the latest Microsoft JVM (from IE 4.01), although not so perfect
(tooltips don't show text and some examples produce spurious exception stack
backtraces, but continue operating).
Oleg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Fri Feb 27 17:26:02 1998
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 17:00:13 2004
Subject: ANN: XLogo - programming with XML Logo Turtle Graphics
Message-ID: <2.2.32.19980227172605.00916750@dream.paragraph.com>
XLogo Announcement
------------------
XLogo is a markup language I wrote to program Logo Turtle Graphics with XML
in Java applet. XLogo program is a well-formed and valid XML document. XLogo
runtime is a set of Java classes that process XLogo program.
The main reason for XLogo was to find out the advantages that XML provides
for developing problem domain specific meta languages. Another goal was to
learn XML and experiment with SAX - Simple API for XML.
To find more about XLogo check:
http://www.geocities.com/SiliconValley/Lakes/3767/xlogo-index.html
Any comments and ideas are most welcome !
Thanks,
Dima
---------------------------
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Feb 28 03:24:03 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: org.xml.sax.AttributeMap
In-Reply-To: <34F38506.3D19B68A@jclark.com>
References: <199802250126.UAA00473@unready.microstar.com>
<199802250138.UAA00524@unready.microstar.com>
<34F38506.3D19B68A@jclark.com>
Message-ID: <199802280322.WAA00888@unready.microstar.com>
James Clark writes:
> I agree that SAX ought to provide access to unparsed entities but I
> don't think this is the right way to achieve it. For a start, I can
> have an ENTITIES attribute, so all these methods would need two
> arguments (the index of the attribute in the attribute list, and the
> index of the token in the value).
An excellent point, and one that I missed in the original SAX.
> I think a better approach is for the processor at the end of the prolog
> to pass an object to the application that provides information about all
> the declared notations and unparsed entities.
>
> XP has a DTD object that does this, but it might be better to call it
> something else (like UnparsedEntitySet) since SAX might someday be
> extended to provide full DTD access.
This is a good idea, but I need to find a way to avoid using the
Java-specific Enumeration class that your example uses (since I've
already eliminated it from AttributeList).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Feb 28 12:29:18 1998
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 17:00:13 2004
Subject: SAX: Sorting out org.xml.sax.AttributeList
Message-ID: <199802281227.HAA00658@unready.microstar.com>
I have been working very hard to keep the number of interfaces in SAX
to a minimum, but it looks like there will be no way to avoid adding a
couple of additional ones if SAX is going to support unparsed entities
(as, I think, it must).
James's suggestion of using indexed properties instead of a lookup-map
is a very good, light-weight one. If attributes, entities, and
notations are all indexed, then they will share a certain amount of
common functionality which should be split out into its own
interface:
package org.xml.sax;
public interface NameList {
public abstract int getLength ();
public abstract int getIndex (String name);
public abstract String getName (int index);
}
This is very JavaBean-like, except that getName does not throw an
ArrayIndexOutOfBounds exception (it just returns null for an invalid
index, and getIndex() returns -1 for a name that is not present).
Next, attribute lists extend this interface to add value and type:
package org.xml.sax;
public interface AttributeList extends NameList {
public abstract String getType (int index);
public abstract String getValue (int index);
}
For notations, we need external identifiers instead:
package org.xml.sax;
public interface NotationList extends NameList {
public abstract String getSystemId (int index);
public abstract String getPublicId (int index);
}
Unparsed entities are identical to notations, but they also need the
name of the associate notation:
package org.xml.sax;
public interface UnparsedEntityList extends NotationList {
public abstract String getNotationName (int index);
}
>From a purist point-of-view, UnparsedEntityList and NotationList
should both extend a common ancestor, like ExternalObjectList, but I
am becoming very concerned at the number of interfaces multiplying
here.
The application will gain access to these lists through a DTD
callback in org.xml.sax.DocumentHandler:
public void dtd (UnparsedEntityList entityList,
NotationList notationList)
throws java.lang.Exception;
Should this event always be fired, or should it be fired only if there
actually is a DTD?
How does this sound to everyone? For me, there are pros and cons:
PROS
----
1) This arrangement is _much_ simpler to understand than the old
org.xml.sax.AttributeMap. Most users can deal only with
AttributeList (which is now trivial), and they can ignore
NotationList and UnparsedEntityList unless they need to use
unparsed entities.
2) It is possible to look up a notation or entity directly by name,
even if the name appears in a CDATA entity or in character data
content.
CONS
----
1) Too many interfaces.
2) Users will complain that the dtd() callback does not return other
information, such as lists of declared elements.
3) It may turn out that XML implementors shun unparsed entities and
notations in favour of HREF's and MIME types, in which case we will
have added this complexity to SAX for nothing.
Thanks,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)