Some notations and RTF (was Re: Notations)

ricko ricko at
Wed Sep 30 05:31:42 BST 1998

> Kathie Drake writes:
>  > Does anyone know where I can locate public identifiers for the
>  > following notations: TIFF, XML,HTML,Postscript,RTF and ASCII?

XML and HTML files both use SGML notation, so if you are using themfrom within a
WebSGML environment, you dont need an FPI: the system
identifiers text/xml and text/html might be useful in some circumstaces.

For XML and HTML from within a XML environment, you need an FPI
for the HTML, but not the XML.  An external entity is automatically
XML if it is not declared NDATA.   For an HTML notation FPI, it would
be best if W3C were to profer one.  In the absense of that, you can
  strict HTML 4 from W3C with Microsoft IE5 extensions//EN"
It is probably useful to specify which version of HTML you mean,
because you can then describe things fairly specifically in the name
part of the FPI, and because the HTML files may use no DOCTYPE
declaration. Also, it can help identify ideosyncracies, such as old
Netscape Communicator's relentless moving of lists to outside of
paragraphs and stripping paragraph end-tags, which may have
a great bearing on how a particular file will have to be processed.

For the others, here are some conservative notation identifiers
I made up for anyone to use, taken from my book "The XML &
SGML Cookbook".  In that book I give FPIs for several hundred
notations.  In the case of the first two, I merely reference another
book of standard file formats. (If this is useful, please buy my book.)

The system identifier in the notation declarations are formally incorrect,
in that they are MIME content types and not URIs. However, I dont
think that W3C has standardized the URN notation for MIME content
types yet (has it?), so I don't feel guilty. So treat the system identifier
as "experimental" at the moment.  The official URN syntax will
probably involve prepending "urn:mime:" or something, I guess.

Remember that there are many possible FPIs for the same notation:
the FPI is a formal method to let you track something down
more than a method for allocating a unique and universal name
for something.  If the originator or promoter of
the notation has never promulgated an FPI, then there is no defitive
FPI, and we have to do the best we can, by forming one according to
rules which allow someone to track down what is meant.<!NOTATION tiff.uncomp
PUBLIC    "+//ISBN 0-13-614223-0::The SGML Cookbook//NOTATION
    ISBN 0-7923-91 Aldus/Microsoft Tagged Interchange File Format//EN"
    "image/tiff" ><!NOTATION epsi PUBLIC    "+//ISBN 0-13-614223-0::The SGML
    ISBN 0-7923-91 Adobe Systems Encapsulated PostScript//EN"
    "application/x-epsi" >

<!NOTATION postscript  PUBLIC
 "+//ISBN 0-13-614223-0::The SGML Cookbook//NOTATION
    ISBN 0-201-18127-4::Adobe::PostScript//EN"
    "application/postscript" >

To call ASCII a notation is stretching the idea of notation a bit.
A notation usually resolves to some sort of grammar rather
than to a character set/encoding. However, you can if you need

ASCII can be constructed from ISO's system identifier.
Use this in the absense of anything better:

    "text/plain;charset='ascii'" >

Check that you actually mean ASCII (i.e.  that it does
not have any parseable artificial language in it), otherwise
you are not describing the document. (If you want to use
ASCII as an encoding, not to mean plain text, then you
can use the Formal System Identifier (FSI) encoding FPI
syntax, which you can find in my book at page 2-108, or look
at the HyTime97 website under FSDIR for an idea.)

For RTF, I suggest you ask Microsoft people here for an
FPI. I would be scared to even suggest one (and I copped
out in my book, and just gave an example of what it
might look like if Microsoft made one up) because there
have been so many versions of RTF: just because you
know a file is RTF, does not mean you know enough to
actually use the data. The Mac and PC RTFs are different.
There are many different versions of RTF over time.
Each different locale uses different character sets.  I
believe newest RTF can use Unicode. Is this information
nicely marked up in a header to RTF?  Not that I have
seen, though the most recent one might have things
under control. So in general, RTF is not a specific
notation, but a class of documents, rather like "text".

If you are using RTF, I recommend you make your
own FPI giving all the details you need of the application
that generated the file, or a product that is known to
accept that kind of RTF. For example:

    Microsoft::Office 97::Win32::US::Rich Text Format//EN"
    "text/rtf" >

Note that all this complexity is not because of RTF using {}
rather than XML syntax: it is because of inadequate self-labelling
in headers, regardless of the syntax. XML helps to the extent
only in that it brings to the foreground the issue of labelling
notations and metadata to allow exchange between different
applications.  (I think it is fair to say that RTF must have been
intended as a text format for users to write RTF files suitable
for importing into specific Microsoft applications, rather than
really being a serious round-tripping data interchange format;
this is in contrast to FrameMaker's MIF, for example.  If this
theory is correct, then RTF is not really an interchange/archive
format at all, but an application & locale specific import format.
That is a fine thing, but people who use it for interchange and
archiving should beware, and not blame Microsoft if it fails
or comes out too strange.)

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list