Bush Donor Lists in XML

Elliotte Rusty Harold elharo at metalab.unc.edu
Sun Sep 12 23:50:54 BST 1999

Friday Governor George W. Bush of Texas posted complete records
of his campaign contributions on his web site. However, he
deliberately posted them in PDF format so they couldn't be
imported into a database or a spreadsheet, and consequently
reporters and voters couldn't find out just how much of his
money was coming from whom. Or at least that's what he thought. :-)

I am pleased to announce, that after a few hours of intense
hacking I have succeeded in extracting the crucial information
from the PDF files and have posted them online in XML and tab delimited
formats for anybody who wants them. Accountants,
start your spread sheets!  You'll find the files at


I've written a very simple DTD for the XML version.
<http://metalab.unc.edu/javafaq/bush/donations.dtd> Based on
this DTD the results do appear to be well-formed and valid
(though I've been burned by misbehaving validators before). The
first two validators I tried gave up on trying to parse such a
large (more than eight megabytes) document. Interestingly, the
initial conversion to XML did turn up some bugs in my
PDF-to-text converter program, but the validation of the XML did
not find any additional problems. I can see where a schema
language would be very useful for this sort of reverse
engineering work though.

Eventually I may try to cook up a more serious DTD that more closely
matches the FEC's actual required format for filing electronic copies of
donor lists. I'm also going to try to add a simple XSL stylesheet to these
in the near future, but they're so large that they really challenge anyone
trying to browse them

