Processing Select Patterns in XSL...

Thu Sep 17 10:27:17 BST 1998

James Clark wrote:

> Tyler Baker wrote:
> >
> > One thing that is slightly confusing with select patterns is selecting
> > elements containing parent anchors.
> >
> > For example, say I have:
> >
> > <xsl:template match="book">
> >   <fo:block>
> >     <xsl:process select="../../../heading"/>
> >   </fo:block>
> > </xsl:template>
> >
> > >From what I understand, this would say first go to the third parent node
> > of the current node
>
> ie the grandparent's parent.
>
> > and select all heading elements and process them.
>
> ie all heading children of the grandparent's parent
>
> > This would seem to be an error since another template may already have
> > processed these heading elements.
>
> It would only be an error if they had already been processed (because
> that would get you in a loop).

In this case, I guess that all Nodes in the source tree would need to be flagged
whenever they have been directly processed.  If you encounter a Node that has
been flagged to be invalid (i.e. it has already been processed), then throw an
error.  Am I right in assuming that to conform to the spec you would either have
to maintain this flag value in a special purpose element node, or else have a
list of processed element nodes maintained in the stylesheet (this would seem
like the inefficient solution).

All of this might be useful info for DOM implementors as they might provide a
special flag integer in each element which can have multiple flags optionally set
to it for the benefit of technologies like XSL.

> > Match patterns seem to be pretty straightforward in how you use them as
> > all you need to really do is start at the right-most pattern component
> > and work left.  If everything matches up then finally make sure that the
> > anchor matches up with the parent of the node that matched the left most
> > pattern.  If everything still holds, then apply the template rule to
> > this particular element in the source tree when spitting out the result
> > tree.
>
> Only if the element was selected for processing by an xsl:process or
> xsl:process-children.

Sorry, I was assuming the default template rule applied in the context I was
referring to.  I guess I was not clear here.

> > Now select patterns it seems from first glance that you would instead
> > start from the current node and work left to right instead of right to
> > left as in the case of match patterns.  Essentially, you would start
> > from the current node and recursively process all of the descendants
> > that end up matching the ancestry pattern from left to right.
>
> A variety of strategies are possible.  For example, if you have
>
>   select="foo|bar"

Well for complex OrPatterns I would think this does not work too well.  I
basically just break them up into a list of AncestryPatterns.  For OrPatterns
right now in templates I just clone the template for each additional
AncestryPattern in the OrPattern.

> you can walk the children and process those which are of type foo or
> bar.  If you have
>
>   select="foo/bar|foo/baz"
>
> you can walk the children and then for each child that's of type foo
> walk its children and process those which are of type bar or baz.  If
> you have
>
>  select=".//foo"
>
> you could walk all descendant elements and process those that are of
> type foo.

This was the non-cheap traversal I was referring to.  Someone who used XT said
that for a 2K file and a 2K spreadsheet it was taking them 20 seconds or
something ridiculous to write out the output.  Well considering that XT is only a
reference implementation and that from a quick look see of XT it looked like
probably most of the processing time is spent in String creation with
String.substring(), etc.  I told this person that if Mr. Clark really spent a lot
of time trying to whip this into a commercial product, he would likely find his
processing time less than a second.

Nevertheless, the biggest thing I worry about with XSL is the possible runtime (I
am referring to O Notation) of the various pattern searches that can be
conducted.  For large documents, patterns which frequently use the ancestor
operator can obviously become very expensive.  Efficient indexing of the source
tree among other optimizations can significantly decrease some of these search
times, but it is a real worry to me that a client's expectations of XSL's
processing capabilities when presented with large source trees and complex
stylesheets are greater than reality.  It would not be good for me or any other
person currently involved in XSL software to have to explain to clients that
their HTML layout should be restricted to the look and feel of HTML 2.0 web pages
simply because a high-level of complexity will bring browsers or server-side XSL
Processors to a crawl.

> >  For
> > ancestry patterns that do not contain an immediate ancestor operator
> > this process would be rather cheap.
> >
> > But in the above example, what do you do when relative or absolute
> > anchors withing select patterns anchor a node which is an ancestor of
> > the current node in context.  In this case, it seems as if you can have
> > multiple template rules acting on the same elements.
>
> Huh?

I was referring basicly back to my previous comment about being able to "select"
ancestors (instead of just descendants) of the current node in context.  I guess
I can sort of understand now how this can be useful (say you want to reinsert a
title that may be the first node in the tree) now.  I know this sort of question
may have been a bit immature, but I like many other people are trying to first
understand XSL and how it can be creatively applied in ways that do not just
involve processing XML to HTML.

> > Another question is what to do with Absolute Anchors.  I would think
> > that for select patterns it would not make sense for this to be allowed
> > as the entire template match then has nothing to do with the actual
> > processing.
>
> Not so.  Typically you would be using document level information in the
> processing of some element.

I can see what you are saying now and I think I see the light (-:

Thanx very much for this reply as it has helped me personally understand some of
these questions a lot better and has also shed some light on how powerful XSL can
really be.  My only real concern right now is processing efficiency of XML not
necessarily in terms of implementation, but in terms of the general runtime
expense of pattern matching and selecting.

Situations like:

<process select 'ancestor('/')//foo'/>

in very large documents basically say to go to the root and recursivly traverse
the entire document tree and look for "foo" elements.  For an average size source
tree and a good number of templates which do this, performance problems should be
evident.

It would be very beneficial if you could index the source tree before doing
template matching if after all of the import actions that there were special
commands to instruct the XSL Processor to index certain frequently looked up
elements that either end match patterns.  I suppose this could be done in a
proprietary way using PI's, but a standard way would do everyone a lot of good.
Perhaps this all may be a knee-jerk so I suppose we will all have to first find
out what kind of patterns tend to cause users processing problems and then either
warn stylesheet writers (is this the accurate term) about what XSL's processing
limitations may be.

Again much thanx,

Tyler

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)