A utility to make msxsl more useful

Andrew Bunner bunner at massquantities.com
Fri Sep 4 23:39:00 BST 1998


  I wrote a small Perl script that can be used to preprocess XML files
before sending them to msxsl. Why might you want to do this? So you can
expand ENTITY references and do something like <INCLUDE
HREF="included_file.xml"/>

  It's very basic and very small so I just attached it to this message for
anyone who's interested.

  Here's the syntax for using it from the DOS command prompt...

C:\<your path to Perl>\Perl.exe expand.pl myfile.xml > temp.xml
msxsl -i myfile.xml -s myfile.xsl -o output.html

  myfile.xml can define entities in its internal and external DTD by saying
<!ENTITY entityname 'VALUE'> or <!ENTITY entityname SYSTEM 'filepath'> You
can use single or double quotes.

  I also made it so you can include a file by saying <INCLUDE
HREF="filetoinclude"/>

  Basically, I'm trying to find ways to make msxsl usable now. I was sort
of hoping some Java programmers would leap to the rescue and turn msxml (or
some equivalent parser) into type of preprocessor for msxsl but, failing
that, I worked up a quick and dirty way to do what I want. Hopefully some
one else will find it useful.
-------------- next part --------------


main();

sub main {
	$xml = (&readFile($ARGV[0]));
    %externalEntities = &parseExternalDTD($xml);
    %internalEntities = &parseInternalDTD($xml);
    my($moreToGo) = (1);
    while ($moreToGo) {
    	$moreToGo = &expandEntities(%externalEntities, %internalEntities) | &expandLinks(%externalEntities, %internalEntities);
	}
    print $xml;
}

# $_[0] = file name or path
# returns full text of file
sub readFile {
	my($contents);
	my(@fileInfo) = stat($_[0]);
	open(F, $_[0]) or die "Couldn't open $_[0]\n";
	read F, $contents, $fileInfo[7];
	close(F);
    return $contents;
}

# $_[0] full text of an XML document
# returns hash of external entities and what they reference
sub parseExternalDTD {
	# Looking for...  <!DOCTYPE foo SYSTEM 'bar.dtd'>
	unless ($_[0] =~ /<!DOCTYPE\s+\w+\s+SYSTEM\s+['"]([^"']+)/) {
    	return {};
    }
    my($dtdPath) = ($1);
    my($dtd) = &readFile($dtdPath);
    my(%entities) = (&extractEntities($dtd));
    return %entities;
}

# $_[0] full text of XML document
# returns hash of internally defined entities and what they reference
sub parseInternalDTD {
	my(%entities) = (&extractEntities($_[0]));
    return %entities;
}

# $_[0] text, possibly containing <!ENTITY> declarations
# returns entity has of names and values
sub extractEntities {
	my($text) = $_[0];
	my(%entities);
    my($entityName, $entityPath);
    # Looking for <!ENTITY foo 'bar'> or <!ENTITY foo SYSTEM 'bar'>
    while ($text =~ /<!ENTITY/) {
    	if ($text =~ s/<!ENTITY\s+(\w+)\s+['"]([^'"]*)['"]>//s) {
        	$entities{$1} = $2;
		} elsif ($text =~ s/<!ENTITY\s+(\w+)\s+SYSTEM\s+['"]([^'"]+)['"]>//s) {
        	($entityName, $entityPath) = ($1, $2);
            $entities{$entityName} = &readFile($entityPath);
		}
	}
    return %entities;
}

# @_ is a hash of entities and what they expand to
# works on global variable $xml searching for &foo; references
# returns true if it was able to make any replacements
sub expandEntities {
	my(%entities) = @_;
    my($gotOne) = (0);
    while ($xml =~ s/\&(\w+);/$entities{$1}/) {
    	$gotOne = 1;
    }
    return $gotOne;
}

sub expandLinks {
	my($gotOne) = (0);
	# We're looking for... <INCLUDE HREF="foo"/>
    # This is not a complete implementation! A real XML processor would
    # look for any type of link that's defined to have SHOW="EMBED" and ACTUATE="AUTO"
    # ...but that's too much work for what I'm after
    while ($xml =~ s/<INCLUDE\s+HREF=["']([^"']+)["']\/>/&readFile($1)/se) {
    	$gotOne = 1;
	}
    return $gotOne;
}
-------------- next part --------------

-- Andrew

   Andrew Bunner
   President, Founder Mass Quantities, Inc.
   Professional Supplements for the Perfect Physique
   http://www.massquantities.com 


More information about the Xml-dev mailing list