A very meaty editor discussion

Tue Apr 28 17:51:56 BST 1998

Hi All,
  Recently I've been working on an XML JavaBean for both tree and
stream based editing (two-way), and I'd like to share my experiences,
and solicit any advice from the experts. I have zippo experience
designing XML/SGML stream editors (and you'd be suprised how little
info on algorithms exists in the traditional CS literature. 
Just try finding a mention of "gap buffers" in recent CS textbooks)

The editor is a true two-way editor that supports simultaneous
realtime views of a document via both tree editing, and stream 
based editing. Stream based editing can take many forms, such
as editing the syntax colored XML source, or, editing a
CSS/XSL rendered version. (style applied in realtime) Right
now, I'm concentrating on just the syntax colored source.
Nothing fancy.

At the moment, I am not concerned with layout/rendering, that's
a side issue (I'm using the swing.text classes for that right
now), merely the algorithms and datastructures used to maintain
synchronization between a tree and a stream buffer.

First, I'll mention my problems using SAX as perhaps a catalyst
for SAX Level 2 / Authoring Interface.

1) no ability to get at DTD info. If a user has internal entities,
   a dtd, etc.... loading up a document and saving it via SAX
   results in corruption.

2) no ability to get at unparsed entities (DOM supplies this I
   believe) This is important if you want to preserve the look of 
   the source.

   I believe in the power of text, and I think everything should be
   editable via a text editor and that information should be preserved.
   If I can't author a document in emacs, load it up into an editor, 
   and go back and forth, something is wrong.

3) comments and CDATA have been mentioned before. I'll note that
   even Netscape's Composer doesn't preserve comments! I believe
   comments are fundamentally important, even in the simplest
   XML applications, because it allows an author to annotate
   a file inline and transfer it anywhere. You could of course
   envision annotation being done differently, say with
   a separate annotation DTD and XML-Link/Xpointers, but would
   it be as human readable as simple source comments?

Problems with DOM

1) No location information
2) I think NodeIterator is too complicated
3) DOM would be great if they added "output"
   functions, e.g. something like
   node.getCanonicalStringRepresentation(processChildrenBoolean)

   Sure, you can cook up your own output functions, but that makes
   the assumption that the application knows what it is dealing with,
   and it may be dealing with XML or HTML (depending on what DOM object
   is being used)

My stop gap measure

My application uses SAX to parse documents and build a parse
tree. However, I regretfully make calls directly to the parser
of my choice to get at the DTD info. (e.g. AElfred) So
at the moment, I am locked into AElfred. (note, AElfred
doesn't give enough info to reconstruct a DTD either :(,
but right now, I am just interested in preserving internal
entities.

The parse tree nodes implement several interfaces, one of which is 
DOM. My application then uses 100% pure :) DOM interfaces for most
of its work, so when parsers start implementing DOM directly,
I can switch easily. (ideally, I would like SAX Level 2 to give
enough info that the full DOM spec can be implemented on top
of it)

Each Node implements the DOM interface, plus MutableTreeNode,
and swing.text.Element. The Attributes of a Node implements
TableModel. All of this is done via inner class delegates.

Thus, given a Node, you can write

DefaultTreeModel dtm=new DefaultTreeModel(node);
JTree jt = new JTree(dtm); // display the tree

or

JTable jt = new JTable(node.getAttributeTableModel());

This is the elegance of Swing, since a node implements
a TableModel interface, you can make the Attributes
appear to be a "table" with name and value columns.

or

EditorKit ek = JEditoPane.createEditorKitForContentType("text/xml");
Document doc = ek.createDefaultDocument();
insertNodes(node.getElementModel()); // complicated function :)

In this case, the node returns an interface for swing.text.Element
which is an SGMLish representation that Swing's textfields/textareas
(and HTML displayer) use.

Basically, by using a single datastructure which exhibits multiple
models and interfaces, I can keep several different views of the
data automatically in sync. For instance, if you update an
attribute via JTable, the changes are automagically reflected
into the JTree view.

Now the difficulty: maintaining symmetry between the text model
and the tree model.

A small Swing Digression:

Fundamentally, Swing's Text views use an internally synchronized
SGML <-> Stream Model already. Briefly, Swing maintains an internal
object called StringContent which implements an internal interface
called 'Content'

It is responsible for several things, but primarily it maintains
a linear array of characters representing the text of the buffer.
As you would expect, the 'model' of  text is one of a linear
array with a single coordinate, the offset, used to locate
data in the model.

Another thing the Content object does is maintain an internal
array of "bookmarks" into the text. These bookmarks are
used by applications to keep a fixed persistent position
within a document. For instance, if I bookmark position '32' in
the text which happens to have the word 'foo', and the user inserts 
5 characters at the start of the document, Content will 
automatically update this bookmark to point to location 37.
When the application tries to dereference this bookmark,
he will still find the word 'foo' The bookmarks are represented
by the "Position" interface.

Simultanteously, Swing's Document model keeps an internal tree
of swing.text.Element objects which are projected over the Content
object by means of Position (bookmark) references. Each node
in the tree has a getStartOffset() and getEndOffset() method
which uses a Position to track the beginning and ending of
that node in the linear Content model. Thus, if the
result of an editing operation causes a piece of text
in the linear Content model to be moved by 10 text positions,
the Content object will increment the starting and ending bookmarks
by 10 positions, such that the Element tree remains accurate.

All of this works wonderfully well when you're dealing with
plain old text, or styled text (the Swing Notepad and Stylepad
examples), but becomes complicated when you need to do parsing.

First of all, Swing's default document models have *hardcoded*
parsers that you can't replace. This is remarkable because
everything else in Swing is more than pluggable, including
swing.text where you can plug just every everything. The hard
coded parser only really understands line breaks. Swing
also has no easy interface for directly inserting nodes.
It expects text to be inserted via insertString(). It
desparately needs an insertNode(parent, child) on the
default Document model.

Secondly swing.text.html package shows how you can build an HTML viewer
with Swing by parsing a whole document at once, and then bulk inserting
the nodes into a Swing DocumentModel, the problem is, it provides
no example of how to do a real-time editor.

So finally, we come to the raw meat of this post, which is 
what my ideas are, and whether or not they are viable, and if
there is some well known algorithms for this.

Requirements for real-time two-way editor:

1) Tree and Source views remain in sync at significant events
    (commit/save, emacs style electric-characters, or realtime)

2) Output is stable. Performing an insertion on the tree side
   of the editor cannot cause a "drastic" reordering of the text
   in the text window. (some tree-dumper algorithms have nasty
   failure cases)  For example, if you store attributes in a
   Hashtable, you can get the attributes appearing in random
   order! 

3) editing MUST me free. The problem is significantly reduced if
   you use Henry Thompson's XED approach, but I am a big fan of
   emacs, and I don't want the editor beeping at me constantly or
   restricting my movements. I'd like it to help me (tab-completion
   on element and attribute names), but not restrict me.

   Think EMACS XML-Mode, Font-Locked, perhaps with right-click context
   menu, and with a tree browser.

Two ideas, One Dumb, and One Untested:

Dumb Idea: After designing a stable tree->stream dumper,
           I use an "ultrafast" XML parser to basically reparse
           the entire document everytime the user presses
           an electric character. E.g. if they type a '>'
           I check to see if they just entered a close tag, or
           an empty element, and then reparse the entire document.

           if the parser is very quick, say 1mb/sec, then even
           a 100k document will only take .1 seconds, hopefully
           it would be unnoticable. Since Web documents rately
           exceed 20k, this amounts to about .02 - .2 second delay.
 	  (less than garbage collection delays)

Untested Idea: When the user starts entering text, they can be in
               one of several states (Henry Thompson is probably
               more familar than I am with all the possible ones)

the user could be in

1) a CDATA section
2) a text section
3) on an element's name/type
4) on an attribute name
5) on an attribute value
6) within an entity

the user could type

1) whitespace
2) ordinary text
3) a special character &, >, <, ", =, ...

Let's ignore the cases where the user is just editing identifier 
names or raw text and get to the nitty gritty which is,
what happens if the user types < or >

Consider the following situation. Text in the buffer is

<FOO>
  Hello world (* cursor position) 
</FOO>

This is already parsed into a tree. The user presses '<',
we now have

<FOO>
  Hello world <
</FOO>

which is a non-wellformed document. Let's say the user continues
by typing 'BAR>'. Now we have

<FOO>
  Hello world <BAR>
</FOO>

It's still invalid.  In fact, it will stay invalid until the user
types

<FOO>
  Hello World <BAR></BAR>
</FOO>

In which case, now we can rightfully insert a BAR node into the Element 
tree. 

My approach is to store a list of invalid areas, and to treat
invalid areas a CDATA nodes in the TreeView until such time
as they become valid. (possibly, syntax coloring them in
the stream editor as being bad)

Now, the part I haven't thought through totally. When the user
enters a '>', First, determine if the tag immediately before 
the '>' is an empty tag (/>), and if it is, parse it,
add it to the tree, and split the invalid region into two
new invalid regions (before and after the parsed tag)

Otherwise, is this an end tag (</FOO>)? if it is,  scan backwards until
we find a matching start tag (<FOO>) in the invalid regions list. Once you
find one, parse both the start and end tags, and split both
of their invalid regions. Recurse, and parse all invalid
regions there are located after the start tag, and before
the end tag.

For parsing, once I locate the start and end offsets of an element,
I'd simply wrap it in a bogus <XML> </XML> element, send it to
SAX via a StringBufferInputStream, and then only watch for
a single start/end event.

Another approach, rather than the manual recursion technique I
outline above, is to locate the start/end offsets of the start/end
tags in the invalid regions, and tell SAX to parse the region
spanning from the start to the end offset, and replace that whole
section in the tree. For instance, find the offset of '<' in '<FOO>',
and the offset of '>' in '</FOO>', tell SAX to parse everything in between,
and split off the invalid regions before <FOO> and after </FOO>

Phew, ok, those are my ideas, what's the expert opionion.

-Ray
ray at contentware.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)