<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">


<META content="MSHTML 5.00.2919.800" name=GENERATOR></HEAD>

<BODY>

<DIV><FONT size=2><FONT color=#0000ff><FONT face=Arial><SPAN 

class=136121718-22091999>These results are consistent with tests that I have run 

against actual XML files generated from databases.<SPAN 

class=673591718-22091999>&nbsp; After compression, there is little difference 

between different syntactic families.</SPAN></SPAN></FONT></FONT></FONT></DIV>

<BLOCKQUOTE style="MARGIN-RIGHT: 0px">

  <DIV align=left class=OutlookMessageHeader dir=ltr><FONT face=Tahoma 

  size=2>-----Original Message-----<BR><B>From:</B> Mark Nutter 

  [mailto:mnutter@fore.com]<BR><B>Sent:</B> Wednesday, September 22, 1999 10:26 

  AM<BR><B>To:</B> xml-dev@ic.ac.uk<BR><B>Subject:</B> RE: RFC: Attributes and 

  XML-RPC<BR><BR></DIV></FONT>At 12:16 PM 09/22/99 -0400, Hunter, David 

  wrote:<BR>

  <BLOCKQUOTE cite type="cite">So even if you<BR>compress the files, the 

    attribute version will be able to compress to 50%<BR>smaller than the other 

    file.&nbsp; Again, 2KB isn't a lot, but if we're talking<BR>megabytes in 

    size, 50% is a lot.</BLOCKQUOTE><BR>I wrote a quick perl script to take 

  /usr/dict/words and turn it into an XML file, with some artificially generated 

  "attributes".&nbsp; In the resulting file named attrib.xml, each &lt;word&gt; 

  tag contains the additional information as attributes.&nbsp; I did the same 

  thing to produce a file called child.xml, except that the additional 

  information is presented as a child element instead of as an attribute.&nbsp; 

  Here are the results:<BR><BR><TT>$ ./make.pl<BR>$ ls -l<BR>total 

  13004<BR>-rw-rw-r--&nbsp;&nbsp; 1 mnutter&nbsp; mnutter&nbsp;&nbsp; 5811852 

  Sep 22 13:16 attrib.xml<BR>-rw-rw-r--&nbsp;&nbsp; 1 mnutter&nbsp; 

  mnutter&nbsp;&nbsp; 7445892 Sep 22 13:16 child.xml<BR>-rwxr-xr-x&nbsp;&nbsp; 1 

  mnutter&nbsp; mnutter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 976 Sep 22 13:16 

  make.pl<BR>$ gzip attrib.xml<BR>$ gzip child.xml<BR>$ ls -l<BR>total 

  1127<BR>-rw-rw-r--&nbsp;&nbsp; 1 mnutter&nbsp; mnutter&nbsp;&nbsp;&nbsp; 

  671039 Sep 22 13:16 attrib.xml.gz<BR>-rw-rw-r--&nbsp;&nbsp; 1 mnutter&nbsp; 

  mnutter&nbsp;&nbsp;&nbsp; 472394 Sep 22 13:16 

  child.xml.gz<BR>-rwxr-xr-x&nbsp;&nbsp; 1 mnutter&nbsp; 

  mnutter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 976 Sep 22 13:16 

  make.pl<BR><BR></TT>I used gzip as an example of off-the-shelf compression 

  technology.&nbsp; As you can see, even though the raw child.xml file is 

  larger, the compressed version is *smaller* than the corresponding 

  implementation with attributes.<BR><BR>This may not be true in all cases, of 

  course, but I expect it often will, due to the way such compression algorithms 

  work.<BR><BR>For your reference, here is the Perl script I used to create the 

  two files:<BR><BR>open WORDS, "&lt;/usr/dict/words" or die "Couldn't open 

  dictionary.\n";<BR>open ATTRIB, "&gt;attrib.xml" or die "Couldn't open 

  attrib.xml\n";<BR>open CHILD, "&gt;child.xml" or die "Couldn't open 

  child.xml\n";<BR><BR>@twenty_strings = qw(one two three four five six seven 

  eight nine 

  ten<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  eleven twelve thirteen fourteen fifteen 

  sixteen<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  seventeen eighteen nineteen twenty);<BR><BR>print ATTRIB 

  "&lt;attrib&gt;\n";<BR>print CHILD "&lt;child&gt;\n";<BR><BR>while($word = 

  &lt;WORDS&gt;)<BR>{<BR>&nbsp;&nbsp;&nbsp; $time = 

  time();<BR>&nbsp;&nbsp;&nbsp; $timestr = 

  localtime($time);<BR>&nbsp;&nbsp;&nbsp; $twenty = rand % 

  20;<BR>&nbsp;&nbsp;&nbsp; $twentystr = 

  $twenty_strings[$twenty];<BR>&nbsp;&nbsp;&nbsp; print ATTRIB 

  &lt;&lt;EOM;<BR>&nbsp; &lt;word time="$time" timestr="$timestr" 

  twenty="$twenty"<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  twentystr="$twentystr"&gt;$word&lt;/word&gt;<BR>EOM<BR>&nbsp;&nbsp;&nbsp; 

  print CHILD &lt;&lt;EOM;<BR>&nbsp; &lt;word&gt;<BR>&nbsp;&nbsp;&nbsp; 

  &lt;time&gt;$time&lt;/time&gt;<BR>&nbsp;&nbsp;&nbsp; 

  &lt;timestr&gt;$timestr&lt;/timestr&gt;<BR>&nbsp;&nbsp;&nbsp; 

  &lt;twenty&gt;$twenty&lt;/twenty&gt;<BR>&nbsp;&nbsp;&nbsp; 

  &lt;twentystr&gt;$twentystr&lt;/twentystr&gt;<BR>&nbsp; 

  &lt;/word&gt;<BR>EOM<BR>}<BR><BR>print ATTRIB "&lt;/attrib&gt;\n";<BR>print 

  CHILD "&lt;/child&gt;\n";<BR><BR>close CHILD;<BR>close ATTRIB;<BR>close 

  WORDS;<BR><BR><BR>

  <DIV>-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-</DIV><BR>

  <DIV>Mark Nutter, &lt;mnutter@fore.com&gt;</DIV>

  <DIV>Internet Applications Developer</DIV>

  <DIV>FORE Systems</DIV>

  <DIV>Some people are atheists 'til the day they 

die.</DIV></BLOCKQUOTE></BODY></HTML>