<html>
At 12:16 PM 09/22/99 -0400, Hunter, David wrote:<br>
<blockquote type=cite cite>So even if you<br>
compress the files, the attribute version will be able to compress to
50%<br>
smaller than the other file. Again, 2KB isn't a lot, but if we're
talking<br>
megabytes in size, 50% is a lot.</blockquote><br>
I wrote a quick perl script to take /usr/dict/words and turn it into an
XML file, with some artificially generated "attributes".
In the resulting file named attrib.xml, each <word> tag contains
the additional information as attributes. I did the same thing to
produce a file called child.xml, except that the additional information
is presented as a child element instead of as an attribute. Here
are the results:<br>
<br>
<tt>$ ./make.pl<br>
$ ls -l<br>
total 13004<br>
-rw-rw-r-- 1 mnutter mnutter 5811852 Sep 22
13:16 attrib.xml<br>
-rw-rw-r-- 1 mnutter mnutter 7445892 Sep 22
13:16 child.xml<br>
-rwxr-xr-x 1 mnutter
mnutter 976 Sep 22 13:16
make.pl<br>
$ gzip attrib.xml<br>
$ gzip child.xml<br>
$ ls -l<br>
total 1127<br>
-rw-rw-r-- 1 mnutter mnutter 671039
Sep 22 13:16 attrib.xml.gz<br>
-rw-rw-r-- 1 mnutter mnutter 472394
Sep 22 13:16 child.xml.gz<br>
-rwxr-xr-x 1 mnutter
mnutter 976 Sep 22 13:16
make.pl<br>
<br>
</tt>I used gzip as an example of off-the-shelf compression
technology. As you can see, even though the raw child.xml file is
larger, the compressed version is *smaller* than the corresponding
implementation with attributes.<br>
<br>
This may not be true in all cases, of course, but I expect it often will,
due to the way such compression algorithms work.<br>
<br>
For your reference, here is the Perl script I used to create the two
files:<br>
<br>
open WORDS, "</usr/dict/words" or die "Couldn't open
dictionary.\n";<br>
open ATTRIB, ">attrib.xml" or die "Couldn't open
attrib.xml\n";<br>
open CHILD, ">child.xml" or die "Couldn't open
child.xml\n";<br>
<br>
@twenty_strings = qw(one two three four five six seven eight nine
ten<br>
eleven twelve thirteen fourteen fifteen sixteen<br>
seventeen eighteen nineteen twenty);<br>
<br>
print ATTRIB "<attrib>\n";<br>
print CHILD "<child>\n";<br>
<br>
while($word = <WORDS>)<br>
{<br>
$time = time();<br>
$timestr = localtime($time);<br>
$twenty = rand % 20;<br>
$twentystr = $twenty_strings[$twenty];<br>
print ATTRIB <<EOM;<br>
<word time="$time" timestr="$timestr"
twenty="$twenty"<br>
twentystr="$twentystr">$word</word><br>
EOM<br>
print CHILD <<EOM;<br>
<word><br>
<time>$time</time><br>
<timestr>$timestr</timestr><br>
<twenty>$twenty</twenty><br>
<twentystr>$twentystr</twentystr><br>
</word><br>
EOM<br>
}<br>
<br>
print ATTRIB "</attrib>\n";<br>
print CHILD "</child>\n";<br>
<br>
close CHILD;<br>
close ATTRIB;<br>
close WORDS;<br>
<br>
<br>
<div>-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-</div>
<br>
<div>Mark Nutter, <mnutter@fore.com></div>
<div>Internet Applications Developer</div>
<div>FORE Systems</div>
<div>Some people are atheists 'til the day they die.</div>
</html>