EDI

Sat Jul 18 23:36:01 BST 1998

> From:          chris m hinds <chris.m.hinds at mail.sprint.com>

> ...In this article, it suggests that XML was
> "concocted somewhat as a replacement for EDI."  Is this true?

[Oops. Sorry, this email ended up being a long manifesto. Note- there's some 
comments below related specifically to what's important about XML and what's 
needed in XSC.]

The act of exchanging XML files is EDI by definition, according to the 
original grandiose acronym and project definition. EDI is essentially thing 
to solve the identical problem as XML, though the focus is business 
transactions. Put some people use the horrid X12 format to try and exchange 
documents, i.e. what SGML tries to do.

What is typically meant by "EDI" is the X12/EDIFACT set of "standards" and 
proprietary networks used to transmit these protocols. (EDIFACT is the 
internationalized version of X12). X12 has 2 parts:
1. A record formatting standard
2. Definitions for data elements, i.e. a DTD of sorts

In my opinion X12 has severe problems that have limited it's usefulness to 
only a small number of large corporations (or small companies trying to do 
business with a giant).

One problem is the record formatting structure-- a punch-card era structure 
that is unimaginable to a modern C/Java programmer. Instead of simple 
parsers, $2-10K software packages are required. It's based on fixed length 
fields, and fixed nesting structures-- completely the opposite of xml. 
Normally, companies need to hire EDI consultants to help untangle the mess.

XML can be used as a record format to escape from the limitations and 
complexity of the X12 method of packaging data elements. It's not difficult 
to convert a parsed X12 message into xml. However, it's very useful to 
untangle the X12 structure. For example, instead of one data element like, 
<phone>, X12 might use 2-5 elements-- extra elements are used for qualifiers,
if the fixed lengths are exceeded, or for repeated values. A good X12-XML 
translator would do more that a simplistic mapping of raw X12 elements.

The other problem with X12 is that the equivalent of the DTD isn't really a 
"standard", it's more like a vague suggestion. Potentially XML/SGML could 
have the same problem if one writes a DTD without documentation. Each company 
interpret's the specification in thier own way. Instead of having a 
paragraph to precisely define the semantics and syntax, there's typically a 
few words. The result is that to implement EDI, you typically hire an EDI 
consultant who uses specialized data conversion software to translate the 
formatting conventions used by each trading partner into a useful form. In 
other words, instead of having everyone exchange data formatted according to 
a precisely defined standard, you write a translator for each trading 
partner. Thus the setup time for adding a new customer/vendor is measured in 
months-- almost useless for internet commerce.

I think XML and in particular XSC can potentially solve this problem. I've 
been overloaded and haven't had time to comment, before, but I'd like to 
offer some suggestions that I think are important.

There seems to be lots of hype on data formatting, as if the main problem to 
be solved is parsing records into data elements. From my experience in 
exchanging data, a competent perl hacker can easily encode or decode most 
record formats. The top 2 problems are:
1. The fields (elements) are undefined or vague.
2. The code values for enumerated types are inaccessable and/or undocumented.

I routinely deal with databases where the maintainers of the data have no 
idea what certain fields mean. The usual way to find definitions is to look 
at data values and figure out how it's used.

The other big problem is that in many cases the enumerated types are 
undefined. There may be a vague reference, or sometimes you need to order an 
expensive paper document from Switzerland.

XML has a serious problem (inherited from SGML) in that the DTD has syntax 
only, and does not have a means (other than little used comments) to define 
semantics. Dealing with the syntax of data records is insignificant and 
trivial in comparison to dealing with the semantics.

The most important part of XSC is the ability to add the documentation 
sections to the DTD elements. The biggest need is simple access to precise 
definitions of data elements. That's something that XSC can do that cannot 
be done with SGML. I will make some suggestions for the XSC spec in this 
area in other emails.

Another crucial part is the ability to make external references to code 
values and thier documentation. Very often element values reference a 
separate standard or some implied standard, and are otherwise undocumented.
For example, the Spatial Data Transfer Standard is broken. The coordinate
coding system is defined using an enumerated value, and in the case of 
state-plane, it references a paper-based NIST standard which has been 
withdrawn, with the mailing address of NIST given as the reference. I'm sure 
there are data files floating around somewhere on the net, but it's very 
difficult to find. If there were an XSC for STDS, it's crucial that 
enumerated external types reference network accessable definitions.

Take "city, state, zip" (zip=US postal code). It's easy to find a list of 
states, and if there were an external XML reference, then a general purpose 
XML parser/checker could validate a <state> element. However, the USGS, 
Census, USPS, and NIST databases of city names all disagree slightly. The 
NIST database is the best one, and I sent back a page full of errors just for 
the city names in California. USPS changes zip codes regularly, without 
publically accessable definitions. (They sell copyrighted databases to junk 
mailers. You download more data in graphics off thier website to find out you 
need to buy a lesser amount of data which you may not be able to make 
public.) Zip changes of course, invalidate Census, USGS, and NIST databases, 
as well as lots of others that reference city or zip codes. With the contents 
of distributed databases linked by the net, it's crucial to link definitions 
and documentation. XSC provides that.

One interesting aspect is that X12 is designed for mainframe-mainframe 
communication only, i.e. the message formats are undecodable except by 
experts (looks just like modem noise.) However, most messages are either 
prepared and/or processed manually, e.g. an EDI translator is used just to 
format a message or print a message. Under the EDI model there is no 
possibility of human-machine interaction-- you either fax paper documents and 
retype or use fully automated coded messaging. There's no concept like an 
ordinary email message that's both human and machine readable.

The X12 structure was developed for use with networks which charged by the KB,
e.g. a company might pay $12K/yr for less data than $6/mo email service.
Though xml or a pretty-printed format (e.g. like RFC822 headers) is larger 
than X12 coding, when xml is compressed with gzip, it's probably smaller.

--------------------------------------------------------------------------
Carl Hage                                              C. Hage Associates
<mailto:carl at chage.com> Voice/Fax: 1-408-244-8410      1180 Reed Ave #51
<http://www.chage.com/chage/>                          Sunnyvale, CA 94086

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)