Socat issues for XML
Paul Grosso
paul at arbortext.com
Mon Sep 21 22:11:45 BST 1998
I also received John Cowan's reply, but I'm using David's
since he included the necessary history. I quote from both.
> > I'm not understanding why OVERRIDE NO doesn't make sense. Perhaps
> > I'm missing something about SAX or your implementation. (Assume I
> > understand TR9401, since I edited it.)
>
>I think that the point is that if OVERRIDE NO were allowed, the public
>identifiers in the catalogue would never be used (since all entities,
>at least, must have system identifiers in XML).
It is true that the catalog entries of type PUBLIC, ENTITY, DOCTYPE,
LINKTYPE, or NOTATION occurring between an OVERRIDE NO entry and the
subsequent OVERRIDE YES entry will be ignored.
I see three options:
1. say that your subset of TR9401 catalogs doesn't include OVERRIDE;
2. say that your subset "recognizes" OVERRIDE entries but ignores them;
3. say that your subset handles OVERRIDE.
Option 1 means that existing catalogs will cause your implementations
to give errors; option 2 means that they will cause your implementation
to behave differently (perhaps subtlely and surprisingly) from existing
TR9401 implementations; option 3 means some extra work for your
implementations.
Looking at the pros and cons, I'd opt for option 3: a little more work
for your implementations seems preferable to the problems 1 and 2 will
mean for end users.
>I quote James Clark's docs, since they are pretty clear:
That's fine, but the text in TR9401 is normative (and, at least in
the case you quote, almost identical). Note that, last I checked,
James had not implemented support for the complete TR9401:1997
(not that I'm saying your effort can't subset TR9401:1997--just
that you might want to be fully aware of what TR9401:1997 says
to use as a resource in your efforts).
>In the XML context, as I said, every external identifier has
>an explicit system identifier (with the minor exception of
>notation declarations). Therefore, any entries with an
>overriding mode of NO will be unconditionally ignored. Since this
>is the default, any catalog not beginning with OVERRIDE YES will
>be ignored *in toto* (except for SYSTEM entries).
No on two counts:
a. OVERRIDE NO is not the default per TR9401, and
b. Even when reading a file starting in OVERRIDE NO mode, the
catalog will not be ignored in toto; not only are there
SYSTEM entries, as you mention, but there can be an OVERRIDE YES
entry which means the rest of the catalog will be processed.
More on my first "no"; from TR9401:1997:
An application must provide some way (e.g., a runtime argument,
environment variable, preference switch) that allows the users
to specify which of these modes [prefer system IDs or prefer
public IDs] to use in the absence of any occurrences of an
OVERRIDE catalog entry.
Note that the initial setting of OVERRIDE is reset for each
catalog entry file:
The initial search strategy in force at the beginning of each
catalog entry file depends on the preference as determined by
the application.
TR9401 went to great lengths not to specify the initial default
for OVERRIDE. Most people involved in writing the Resolution
leaned toward a default of YES, but some leaned toward NO. We
agreed not to decide this point. In fact, several important
implementations currently default OVERRIDE to YES, which is
what you could do for your purposes, since as you point out
this makes more sense for XML.
>
> > >2) As another consequence of system ids being always present and
> > >always URLs, a usable Socat implementation must not search the
> > >whole public catalog space for SYSTEM entries. When should the
> > >search stop? In some sense "when going offsite", but just when is
> > >that? Any suggestions?
> >
> > I don't understand what the problem is, and I don't understand
> > how--if there really is a problem--anything about XML makes it a
> > problem that isn't a problem with SGML in general (XML is SGML, you
> > know).
>
>I'd guess that this is a problem of efficiency: when catalogues are on
>the other end of relatively slow network connection, you don't want to
>retrieve a dozen catalogues unnecessarily.
>All XML system ids are URLs, and
>in general are to be taken at face value. SYSTEM entries serve as a
>private URL-URL mapping scheme, but must the whole of a public-id-
>resolution infrastructure be searched for each and every URL referred
>to in a XML document?
Sorry, I haven't followed your subset of TR9401 (is there a pointer
to some doc?); which one(s) of the DELEGATE and CATALOG entry types
do you support? These are the only two entry types that can send
an implementation off to another catalog entry file, and if the
catalog writer put one of them into the catalog entry file, it
sounds like s/he wants you to go there.
Generally, if there is no match in a given catalog entry file
(for any entry type) and the external identifier includes a
system id (as would be the case with XML), the system id is used.
The only reason another catalog would be searched is when:
1. there has been no SYSTEM or PUBLIC match in that catalog,
and
2. there is a DELEGATE entry that matches the external id's
public id OR there is a CATALOG entry.
I'm guessing you've got a scenario in mind where there is
no SYSTEM or PUBLIC match in a given catalog entry file and
where there HAS been a matching DELEGATE or CATALOG entry,
BUT for some reason you want to ignore the DELEGATE or CATALOG
entry that was put into the catalog (why?) and instead just
give up now and use the system id in the external identifier.
I see three options (and if I didn't at first, I'd invent a third
option, since one is always supposed to have three options):
1. follow all DELEGATE and CATALOG entries as specified until
a match or you run out of things to follow (this is what
TR9401 says and what the catalog writer presumably had in
mind when they put the DELEGATE/CATALOG entries in);
2. leave DELEGATE and CATALOG entry types out of your subset,
since you don't seem to want to follow them anyway;
3. invent a new catalog entry type that says "if you get to
the end of this catalog entry file without a match for
anything except maybe DELEGATE and CATALOG entries and
the external identifier has a system id, ignore the
DELEGATE and CATALOG entries and use that system id."
Option 2 seems internally consistent, but I suspect you want
the DELEGATE and CATALOG entry capability. Option 3 seems
odd--if you can put an entry in the catalog that says ignore
DELEGATE and CATALOG entries in this file, then why don't you
just omit the DELEGATE and CATALOG entries from this file?
That leaves option 1.
Perhaps I've not captured the scenario you're really considering.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list