EMBED and validation

David G. Durand dgd at cs.bu.edu
Wed Dec 3 04:00:17 GMT 1997

At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
>If you are going to 'include' binary 'files' (i.e. entities) then it gets
>more complex. This is my current analysis. It's probably wrong. (Are there
>any Java parsers which manage this?)

Actually, I just noticed, it _is_ wrong (I removed > quoting because it's
too gross for SGMl examples):

	 PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
         Multipurpose Internet Mail Extensions::image/gif//EN">

<!-- I hope I have copied that right - please don't sue - typos are likely -->
<!ENTITY mygif SYSTEM "my.gif" NDATA GIF>
<!-- an ENTITY declaration is required for EVERY image??? -->

><!ATTLIST IMG SRC NOTATION (GIF) #REQUIRED> <!-- I could include JPEG, etc-->

    This should be:


     The notation is attached to the entity, not the citation of the entity.
<IMG SRC="mygif">

Finally, this is a bit overstated, the following lines could (and should)
all be included in any reasonable CML DTD:

	 PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
         Multipurpose Internet Mail Extensions::image/gif//EN">
<!-- I hope I have copied that right - please don't sue - typos are likely -->
<!ATTLIST IMG SRC NOTATION (GIF) #REQUIRED> <!-- I could include JPEG, etc-->

So the internal subset would have to contain the following to support _one_

<!ENTITY mygif SYSTEM "my.gif" NDATA GIF>

and the document would contain:

<IMG SRC="mygif">

Defining a DTD (and its associated stylesheets) generally requires careful
thought about what external notations are required in the intended
application. Predefined notation sets (in the form of external entities
with Public indentifiers) are common as dirt in the SGML world, for the
reasons of interchangeability and author sanity.

The only place the FPI need appear is in the shared declaration, the
stylesheet (used to actually render or trigger processing of the non-XML
data), can use the notation name "gif" to detect a GIF file. No FPI is
involved at the "browser end" (non-validating processor augmented with a
CML stylesheet).

>In XLL I can write a complete document:

Once you factor out the declarations, this looks about the same (assuming
that you also use ATTLIST declarations in the internal subset to factor out
the redundant attribute values on <img>):

          ... etc... (omitted for brevity, and because I don't have the XLL
spec in front of me)

>At 10:29 02/12/97 -0500, Rob McDougall wrote:
>>It would be nice if there was also an "inline" way of doing includes
>>that would allow the XML parser to validate the resulting content.
>Well, XLL does this ***as long as we agree on the semantics***.

No, it doesn't You can define a new stylesheet language (or custom
processor) that does this if you want. Perhaps XSL's re-ordering facilities
will be able to do this, without the validation. Validation is an XML
process, and XML itself does not "include" files except via entities.

>  HREF (or
>IMG/SRC) is so widely used in HTML that people will certainly start doing
>their own thing.

There is no question that XSL will support this markup idiom for exactly
those reasons (it probably does now, but I've not finished reading it yet).

>There are the following possibilities:
>	- wait for a W3C body to pronounce (won't be this year, I suspect)
>	- wait and see what commercial browsers do
>	- invent nine-and-sixty ways of doing it
>	- use XDEV: as at least a means of coordinating *some* people.

Given that XSL will support this, there is no call to go putting any
formatting gunk into your documents. Whatever stylesheet mechanism you
implement in JUMBO should be able to express this, and that is where you
should do it. Note that hardwiring tag names into your processor is a
stylesheet in my terminology, though admittedly not a very flexible one.
>JUMBO will start with the latter, and junk it as soon as anything official
>comes along...

The _only_ thing that I can see XDEV having any utility for is expressing
where to find a stylesheet. Maybe you should think about the fundamental
goals of content markup (_SEPARATION_ of content from processing). Read
Coombs, Renear and Derose's Comm. ACM article for the details.

>[BTW I am not very happy with the idea that FPIs are intended to be human-
>but not machine-readable. That makes them useless for things like image/gif.]

The fact that they are human readable has nothing to do with whether they
are supposed to be machine readable. Rick Jelliffe is wrong when he
asserted that they are intended to be "fuzzily matched". So don't worry
about that at any rate.

  -- David

David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list