Streaming XSL Stylesheets - Was: XML::Writer 0.1 available

Tue Apr 20 13:47:30 BST 1999

On Mon, Apr 19, 1999 at 10:02:56PM -0400, David Megginson wrote:
> XSL provides one good and very powerful model for doing XML
> transformations, but that model itself requires that an entire
> document be held in random-access storage of some sort (say, memory,
> or a database) during processing, and that's inappropriate for the
> very large subset of XML work that is both speed- and memory-critical.

I'd love to differ with you here. In practice, I can't, but in
theory... I have this itch to work out and implemnt an XSL parser that
works as as a SAX stream. Given an XslStream that reads the parsed
stylesheet from an XslDB and has an output SAX stream $this->{OUTPUT},
the notion is something like this:

parser reads "<someTag attr1='value1'>"

calls W3C::SAX::XslStream::startElement

XslStream checks is XslDB for a list of all rules that could apply to
'someTag' and finds only a single template:
  <xsl:template match="someTag[@attr1=value1]">
    <innerTag>
      <xsl::apply-templates>
    </innerTag>
  </xsl:template>
This tells it that there is no ambiguity or ordering so it can call
$this->{OUTPUT}->startElement('innerTag', new AttributeList)

That's the ideal case, but XSL accomodates many situations where it's
not that easy.  Since XSL defines sorting and sequencing, it is
possible to arrive in at a rule that cannot be immediately sent to the
output stream, like:

  <xsl:template match="someTag[@attr1=value1]">
    <tagA>
      <xsl::apply-templates match="As">
    </tagA>
    <tagB>
      <xsl::apply-templates match="Bs">
    </tagB>
  </xsl:template>

In this case, the stream can dump output the tagA immediately, and
stick the tagB in an event queue (or maybe just a grove) to be flushed
when W3C::SAX::XslStream::endElement('someTag') is called.

If the XML document contains long series of atoms that the stylesheet
says do not need to be ordered, the transformed document can be
generated as the document is parsed. For instance:

DOCUMENT:
<molecule>
  <atom>
    <symbol>Ne</symbol>
    <positions>
      <orbital>s</orbital>
      <shell>1</shell>
      <count>2</count>
    </positions>
    <positions>
      <orbital>s</orbital>
      <shell>2</shell>
      <count>2</count>
    </positions>
    <positions>
      <orbital>p</orbital>
      <shell>2</shell>
      <count>6</count>
    </positions>
  </atom>
  <atom>...</atom>
  <!-- and a zillion more atoms -->
</molecule>

XSL STYLESHEET:
<xsl:template match="atom">
  <xhtml:h1>The Atoms...</xhtml:h1>
  <xhtml:ul>
    <xsl:apply-templates select="positions">
      <xsl:sort select="orbital"/>
      <xsl:sort select="shell"/>
    </xsl:apply-templates>
  </xhtml:ul>
</xsl:template>
<!-- some more templates for specifying how the positions are rendered -->

The sort on the positions causes the XslStream to buffer the output
for each position until the /position is hit, but it still gets to
flush the atoms basicly as fast as they come in.

I see lots of folks talking about using XML for moving vast streams of
business process data. I beleive this sort of mechanism will make all
that pheasible without have to write custom translation engines for
this data. I have sketched out the players in perl (XslParser, XslDB,
XslStream) but haven't started the real work of coding the
transformations and when it can flush and when it can't. Anybody out
there interested in taking this over?
-- 
-eric

(eric at w3.org)

PS. If the chemistry-looking stuff above is wrong, it's because I'm
    not now, nor ever intended to be, a physical chemist.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)