RFC: Attributes and XML-RPC

Mark Nutter mnutter at fore.com
Wed Sep 22 19:25:13 BST 1999


At 12:16 PM 09/22/99 -0400, Hunter, David wrote:
>So even if you
>compress the files, the attribute version will be able to compress to 50%
>smaller than the other file.  Again, 2KB isn't a lot, but if we're talking
>megabytes in size, 50% is a lot.

I wrote a quick perl script to take /usr/dict/words and turn it into an XML 
file, with some artificially generated "attributes".  In the resulting file 
named attrib.xml, each <word> tag contains the additional information as 
attributes.  I did the same thing to produce a file called child.xml, 
except that the additional information is presented as a child element 
instead of as an attribute.  Here are the results:

$ ./make.pl
$ ls -l
total 13004
-rw-rw-r--   1 mnutter  mnutter   5811852 Sep 22 13:16 attrib.xml
-rw-rw-r--   1 mnutter  mnutter   7445892 Sep 22 13:16 child.xml
-rwxr-xr-x   1 mnutter  mnutter       976 Sep 22 13:16 make.pl
$ gzip attrib.xml
$ gzip child.xml
$ ls -l
total 1127
-rw-rw-r--   1 mnutter  mnutter    671039 Sep 22 13:16 attrib.xml.gz
-rw-rw-r--   1 mnutter  mnutter    472394 Sep 22 13:16 child.xml.gz
-rwxr-xr-x   1 mnutter  mnutter       976 Sep 22 13:16 make.pl

I used gzip as an example of off-the-shelf compression technology.  As you 
can see, even though the raw child.xml file is larger, the compressed 
version is *smaller* than the corresponding implementation with attributes.

This may not be true in all cases, of course, but I expect it often will, 
due to the way such compression algorithms work.

For your reference, here is the Perl script I used to create the two files:

open WORDS, "</usr/dict/words" or die "Couldn't open dictionary.\n";
open ATTRIB, ">attrib.xml" or die "Couldn't open attrib.xml\n";
open CHILD, ">child.xml" or die "Couldn't open child.xml\n";

@twenty_strings = qw(one two three four five six seven eight nine ten
                      eleven twelve thirteen fourteen fifteen sixteen
                      seventeen eighteen nineteen twenty);

print ATTRIB "<attrib>\n";
print CHILD "<child>\n";

while($word = <WORDS>)
{
     $time = time();
     $timestr = localtime($time);
     $twenty = rand % 20;
     $twentystr = $twenty_strings[$twenty];
     print ATTRIB <<EOM;
   <word time="$time" timestr="$timestr" twenty="$twenty"
         twentystr="$twentystr">$word</word>
EOM
     print CHILD <<EOM;
   <word>
     <time>$time</time>
     <timestr>$timestr</timestr>
     <twenty>$twenty</twenty>
     <twentystr>$twentystr</twentystr>
   </word>
EOM
}

print ATTRIB "</attrib>\n";
print CHILD "</child>\n";

close CHILD;
close ATTRIB;
close WORDS;


-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-

Mark Nutter, <mnutter at fore.com>
Internet Applications Developer
FORE Systems
Some people are atheists 'til the day they die.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19990922/2b1c18b8/attachment.htm


More information about the Xml-dev mailing list