carl at chage.com
Sat Jul 18 23:36:01 BST 1998
> From: chris m hinds <chris.m.hinds at mail.sprint.com>
> ...In this article, it suggests that XML was
> "concocted somewhat as a replacement for EDI." Is this true?
[Oops. Sorry, this email ended up being a long manifesto. Note- there's some
comments below related specifically to what's important about XML and what's
needed in XSC.]
The act of exchanging XML files is EDI by definition, according to the
original grandiose acronym and project definition. EDI is essentially thing
to solve the identical problem as XML, though the focus is business
transactions. Put some people use the horrid X12 format to try and exchange
documents, i.e. what SGML tries to do.
What is typically meant by "EDI" is the X12/EDIFACT set of "standards" and
proprietary networks used to transmit these protocols. (EDIFACT is the
internationalized version of X12). X12 has 2 parts:
1. A record formatting standard
2. Definitions for data elements, i.e. a DTD of sorts
In my opinion X12 has severe problems that have limited it's usefulness to
only a small number of large corporations (or small companies trying to do
business with a giant).
One problem is the record formatting structure-- a punch-card era structure
that is unimaginable to a modern C/Java programmer. Instead of simple
parsers, $2-10K software packages are required. It's based on fixed length
fields, and fixed nesting structures-- completely the opposite of xml.
Normally, companies need to hire EDI consultants to help untangle the mess.
XML can be used as a record format to escape from the limitations and
complexity of the X12 method of packaging data elements. It's not difficult
to convert a parsed X12 message into xml. However, it's very useful to
untangle the X12 structure. For example, instead of one data element like,
<phone>, X12 might use 2-5 elements-- extra elements are used for qualifiers,
if the fixed lengths are exceeded, or for repeated values. A good X12-XML
translator would do more that a simplistic mapping of raw X12 elements.
The other problem with X12 is that the equivalent of the DTD isn't really a
"standard", it's more like a vague suggestion. Potentially XML/SGML could
have the same problem if one writes a DTD without documentation. Each company
interpret's the specification in thier own way. Instead of having a
paragraph to precisely define the semantics and syntax, there's typically a
few words. The result is that to implement EDI, you typically hire an EDI
consultant who uses specialized data conversion software to translate the
formatting conventions used by each trading partner into a useful form. In
other words, instead of having everyone exchange data formatted according to
a precisely defined standard, you write a translator for each trading
partner. Thus the setup time for adding a new customer/vendor is measured in
months-- almost useless for internet commerce.
I think XML and in particular XSC can potentially solve this problem. I've
been overloaded and haven't had time to comment, before, but I'd like to
offer some suggestions that I think are important.
There seems to be lots of hype on data formatting, as if the main problem to
be solved is parsing records into data elements. From my experience in
exchanging data, a competent perl hacker can easily encode or decode most
record formats. The top 2 problems are:
1. The fields (elements) are undefined or vague.
2. The code values for enumerated types are inaccessable and/or undocumented.
I routinely deal with databases where the maintainers of the data have no
idea what certain fields mean. The usual way to find definitions is to look
at data values and figure out how it's used.
The other big problem is that in many cases the enumerated types are
undefined. There may be a vague reference, or sometimes you need to order an
expensive paper document from Switzerland.
XML has a serious problem (inherited from SGML) in that the DTD has syntax
only, and does not have a means (other than little used comments) to define
semantics. Dealing with the syntax of data records is insignificant and
trivial in comparison to dealing with the semantics.
The most important part of XSC is the ability to add the documentation
sections to the DTD elements. The biggest need is simple access to precise
definitions of data elements. That's something that XSC can do that cannot
be done with SGML. I will make some suggestions for the XSC spec in this
area in other emails.
Another crucial part is the ability to make external references to code
values and thier documentation. Very often element values reference a
separate standard or some implied standard, and are otherwise undocumented.
For example, the Spatial Data Transfer Standard is broken. The coordinate
coding system is defined using an enumerated value, and in the case of
state-plane, it references a paper-based NIST standard which has been
withdrawn, with the mailing address of NIST given as the reference. I'm sure
there are data files floating around somewhere on the net, but it's very
difficult to find. If there were an XSC for STDS, it's crucial that
enumerated external types reference network accessable definitions.
Take "city, state, zip" (zip=US postal code). It's easy to find a list of
states, and if there were an external XML reference, then a general purpose
XML parser/checker could validate a <state> element. However, the USGS,
Census, USPS, and NIST databases of city names all disagree slightly. The
NIST database is the best one, and I sent back a page full of errors just for
the city names in California. USPS changes zip codes regularly, without
publically accessable definitions. (They sell copyrighted databases to junk
mailers. You download more data in graphics off thier website to find out you
need to buy a lesser amount of data which you may not be able to make
public.) Zip changes of course, invalidate Census, USGS, and NIST databases,
as well as lots of others that reference city or zip codes. With the contents
of distributed databases linked by the net, it's crucial to link definitions
and documentation. XSC provides that.
One interesting aspect is that X12 is designed for mainframe-mainframe
communication only, i.e. the message formats are undecodable except by
experts (looks just like modem noise.) However, most messages are either
prepared and/or processed manually, e.g. an EDI translator is used just to
format a message or print a message. Under the EDI model there is no
possibility of human-machine interaction-- you either fax paper documents and
retype or use fully automated coded messaging. There's no concept like an
ordinary email message that's both human and machine readable.
The X12 structure was developed for use with networks which charged by the KB,
e.g. a company might pay $12K/yr for less data than $6/mo email service.
Though xml or a pretty-printed format (e.g. like RFC822 headers) is larger
than X12 coding, when xml is compressed with gzip, it's probably smaller.
Carl Hage C. Hage Associates
<mailto:carl at chage.com> Voice/Fax: 1-408-244-8410 1180 Reed Ave #51
<http://www.chage.com/chage/> Sunnyvale, CA 94086
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev