RFC: Attributes and XML-RPC

Julian Reschke reschke at medicaldataservice.de
Thu Sep 23 18:59:33 BST 1999

I find this way surprising. Yes, it's clear that opening and closing tags
compress well, but I would still have expected the attribute version to be

Julian F. Reschke (mailto:reschke at medicaldataservice.de)
MedicalData Service GmbH Münster, Germany

  -----Original Message-----
  From: owner-xml-dev at ic.ac.uk [mailto:owner-xml-dev at ic.ac.uk]On Behalf Of
Mark Nutter
  Sent: Wednesday, September 22, 1999 7:26 PM
  To: xml-dev at ic.ac.uk
  Subject: RE: RFC: Attributes and XML-RPC

  At 12:16 PM 09/22/99 -0400, Hunter, David wrote:

    So even if you
    compress the files, the attribute version will be able to compress to
    smaller than the other file.  Again, 2KB isn't a lot, but if we're
    megabytes in size, 50% is a lot.

  I wrote a quick perl script to take /usr/dict/words and turn it into an
XML file, with some artificially generated "attributes".  In the resulting
file named attrib.xml, each <word> tag contains the additional information
as attributes.  I did the same thing to produce a file called child.xml,
except that the additional information is presented as a child element
instead of as an attribute.  Here are the results:

  $ ./make.pl
  $ ls -l
  total 13004
  -rw-rw-r--   1 mnutter  mnutter   5811852 Sep 22 13:16 attrib.xml
  -rw-rw-r--   1 mnutter  mnutter   7445892 Sep 22 13:16 child.xml
  -rwxr-xr-x   1 mnutter  mnutter       976 Sep 22 13:16 make.pl
  $ gzip attrib.xml
  $ gzip child.xml
  $ ls -l
  total 1127
  -rw-rw-r--   1 mnutter  mnutter    671039 Sep 22 13:16 attrib.xml.gz
  -rw-rw-r--   1 mnutter  mnutter    472394 Sep 22 13:16 child.xml.gz
  -rwxr-xr-x   1 mnutter  mnutter       976 Sep 22 13:16 make.pl

  I used gzip as an example of off-the-shelf compression technology.  As you
can see, even though the raw child.xml file is larger, the compressed
version is *smaller* than the corresponding implementation with attributes.

  This may not be true in all cases, of course, but I expect it often will,
due to the way such compression algorithms work.

  For your reference, here is the Perl script I used to create the two

  open WORDS, "</usr/dict/words" or die "Couldn't open dictionary.\n";
  open ATTRIB, ">attrib.xml" or die "Couldn't open attrib.xml\n";
  open CHILD, ">child.xml" or die "Couldn't open child.xml\n";

  @twenty_strings = qw(one two three four five six seven eight nine ten
                       eleven twelve thirteen fourteen fifteen sixteen
                       seventeen eighteen nineteen twenty);

  print ATTRIB "<attrib>\n";
  print CHILD "<child>\n";

  while($word = <WORDS>)
      $time = time();
      $timestr = localtime($time);
      $twenty = rand % 20;
      $twentystr = $twenty_strings[$twenty];
      print ATTRIB <<EOM;
    <word time="$time" timestr="$timestr" twenty="$twenty"
      print CHILD <<EOM;

  print ATTRIB "</attrib>\n";
  print CHILD "</child>\n";

  close CHILD;
  close ATTRIB;
  close WORDS;


  Mark Nutter, <mnutter at fore.com>
  Internet Applications Developer
  FORE Systems
  Some people are atheists 'til the day they die.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19990923/042daddc/attachment.htm

More information about the Xml-dev mailing list