XML pretty printing in HTML

Matt Sergeant matt.sergeant at bbc.co.uk
Tue Jul 6 11:50:09 BST 1999

> -----Original Message-----
> From: Warren Hedley [mailto:w.hedley at auckland.ac.nz]

> I'm on the lookout for a tool that will allow me to pretty print
> XML files in such a manner that they can be pasted into an HTML
> page. For example, I would like to transform
> <element1 name="foo">
>   <element2 name="bar"
>             longName="barNone"/>
> </element1>
> to (hope the line wrapping doesn't make this too confusing)
> <pre style="font:12pt monospace">
> &lt;<font color="#ff0000">element1</font> <font 
> color="#00ff00">name</font>=&quot;<font
> color="#0000ff">bar</font>&quot;&gt;
> ...
> You get the idea. All attributes are coloured red, attribute 
> values blue,
> element names green, CDATA yellow, etc. (Yuck, that looks hideous).
> Whitespace is preserved, so that "longName" is under "name" 
> in element2.
> Preferably, all of this without having to do a bunch of 
> coding in XSL or
> otherwise. Surely, someone must have already done this - it would be a
> pretty simple PERL script given a well-formed file. I had a look on
> xmlsoftware.com but couldn't find anything.

OK. Here's an attempt:

use XML::Parser;

print XML::Parser->new(Handlers => {
	# Handlers using closures, except Start 'cos it's more complex.
	# $_[0] is the expat object where we store the HTML output
	Init => sub { $_[0]->{html} = '<pre style="font:12pt monospace">' },
	Final => sub { return $_[0]->{html} },
	Start => \&start,
	End => sub { $_[0]->{html} .= '&lt;/<font color="green">' . $_[1] .
'</font>&gt;' },
	Char => sub { $_[0]->{html} .= "<b>$_[1]</b>" },
	CdataStart => sub { $_[0]->{html} .= '<font color="yellow">' },
	CdataEnd => sub { $_[0]->{html} .= '</font>' },

sub start {
	my $expat = shift;
	my $element = shift;
	my %attribs = @_;

	$expat->{html} .= '&lt;<font color="green">' . $element . '</font>';
	if (%attribs) {
		foreach (keys %attribs) {
			$expat->{html} .= ' <font color="red">' . $_ . 
				'</font>=&quot;<font color="blue">' .
	$expat->{html} .= '&gt;';

I've written it so it bundles up the HTML instead of printing it as it goes
so you can embed it in some other application (presumably you want to print
out something as well as just the XML - like a body and html tags), but you
could easily modify it. If using it in another app change the "print
XML::Parser..." for "my $html = XML::Parser..." and do something with $html.

Have fun.

BTW: An XML parser doesn't maintain any whitespace between attribute tags -
but I guess you could dump what I've done here for a large regexp system if
you really need that feature.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list