"View Structure"

Peter Murray-Rust Peter at ursus.demon.co.uk
Fri Apr 18 22:27:32 BST 1997

Thanks Jon,

	This is extremely useful and I hope others like the approach.

	'Show XML' (i.e. View Source) was added quite recently in JUMBO.  There
is also a 'Save as XML' [to file] which is essentially the same and brings up 
a file browser (only in (Java)applications - not applets, of course)

	'Show source' is not completely trivial, and JUMBO shows the
normalised source (essentially the ESIS output from SGMLS/NXP, etc.) [In fact
JUMBO will read in ESIS].  I bow to the experts but essentially this means that
JUMBO has destroyed some of the grove and does NOT contain:
	- comments (the spec is not normative about what a useragent may/must
	- declaration subset.  (or indeed any other part of the DTD)
	- DOCTYPE info (except JUMBO preserves the DTD name, although ESIS does
	- marked section information
	- character entities
	- knowledge about empty elements (ESIS doesn't know and I suspect
		that some of the parsers discard this)

This means that JUMBO (and probably most other browsers) views 
'normalised source'.

Also, JUMBO is not quite clear about whitespace (Argh!).  It does not honour
the DEFAULT|PRESERVE because (a) it isn't absolutely clear that this is the
final word on the subject and (b) I don't use this convention myself.

Also (again because I do not expect my  customers to understand REs), all my 
#PCDATA can have folded whitespace.  Therefore I prettyprint (not very pretty)
so that no characters run beyond 80 chars.  (This is trivial and there could
be a menu/switch for this.)

Also I expect I have got the REs in mixed content wrong.  At present I put
each tag on a different line for human readability.  It's possible  that in some
places this will introduce false REs (I need someone to soak test it).  Also,
since this is an area in the spec where the boundaries between the parser and
application are not absolutley clear, we need more testing.

JUMBO honours PIs (and uses them).  I create PIs as Nodes under other Nodes
would have a PI node hanging under FOO.  When the tree was traversed, the PI
would come immediately after FOO.

JUMBO takes whatever comes out of a parser.  If a PI in the middle of #PCDATA
forms two Pseudoelements, it honours those, with a PI in between.  i.e.
This is 
would have three children of FOO (PCDATA1, PI and PCDATA2 in that order).

I am assuming that a parser can only create the following NodeTypes:
	Elements (e.g. FOO)
	Comment   (if appropriate)
I am assuming that (unlike CoST) there are no PELs such as RE.

In JUMBO, PCDATA and PIs are dealt with in the same way as Elements
(comments are ignored).  [I gave them GIs of _PCDATA and _PI - thinking this
was safe because underscore was forbidden.  Now I will have to do something
else (although anyone who gratuitously creates a _PCDATA GI is asking for
trouble :).]  The advantage of this is that they can have icons, buttons and
I treat them like first-class citizens.

The conclusion of this is that 'View source' will not necessarily help people
who want to see every character in the source.  (JUMBO may not even have access
to this in some distributed system).  I'd value comment if I've got my 
components wrong.

Oh! I forgot attributes.  JUMBO plays back the attributes in the order they 
were given, but will have folded the NAME to uppercase.  Entities will have
been resolved and space will have been normalised (mainly at the parser).

In message <199704181759.KAA07880 at boethius.eng.sun.com> bosak at atlantic-83.Eng.Sun.COM (Jon Bosak) writes:
> Because every well-formed XML document describes a tree, however, it's
> also possible to have a "View Structure" option that would give you a
> default navigable view of the document as a tree -- like the default
> Jumbo behavior.  This view would allow you to expand and collapse the
> structure and it would show attribute nodes that could be opened to
> see the attributes on each element.  It could use a file manager
> metaphor, like Jumbo, or it could use the plus-button/minus-button
> interface used for most dynamic browser TOCs, or it could use
> something else; the point is that it would be trivially autogenerated
> on request, show the document in XML terms, and provide a commonly
> understood base view independent of the application interface supplied
> for a given document type that would always be available to people
> trying to understand or debug an XML document.  "View Structure" would
> presumably *not* use different type sizes, etc., but concentrate
> instead on exposing the guts of the document, so I would expect the
> display generated by character-mode browsers to look roughly the same
> as the display generated by the fancy graphics-mode browsers.

An additional thing in here is 'show attributes'  (JUMBO has this as a button
for those nodes which *have* attributes.)  Of course some of the attribute 
information in the DTD may be lost - JUMBO does not always know which 
the IDs and IDREFs are.  (I may have missed it in the spec, but I am not
sure parsers have to keep this info).

I assume that PCDATA can be displayed in the tree.  (My code is suspect here, 
partly because I was not quite clear what was going to happen.  However under 
certain circumstances it will create PCDATA nodes and display them.  These 
nodes may not have children.  The same is true for PIs.

> This would just be an informal convention, like the inclusion of "View
> Source," but I think that it would be an extremely useful one.

Agreed.  It also helps to clarify the 
parser | treetool   division of responsibilities
the treetool | linkprocessor division
and the  processor | application division.

I am still struggling with event streams.  There is no formal way in the
language to flag an event stream and at times I think this would be useful.
It can save a lot on memory allocation for the tree (because the stream
need not be parsed until viewed).  It is also not much fun viewing an event 
stream as a tree :-)  Point JUMBO at an HTML document (normalised to XML)
and it isn't very intuitive...


> Jon
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo at ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)

More information about the Xml-dev mailing list