words (RE: extensibility in XSchema?)

Mon Jun 22 20:25:13 BST 1998

At 03:16 PM 6/22/98 +0000, Trevor Turton wrote:
>On 98/06/22 at 14:15:36 Rick Jelliffe wrote:
>
>>My theory is that industry people use "semantics" as jargon.
>......
>>This is why the particular word "semantics" is worthless.
>
>This sounds accurate to me -
>anyone else have a sharp definition of sentacs versus symantix?

I think the distinction between syntax and semantics can be made very
clearly: syntax defines the rules for construction of a character string
(or binary encoding if you want to include that as being syntactic).  The
syntactic constructs thus become things that can be *mechanically*
identified and reported.

Semantics is the interpretation (whatever that might mean) of the syntactic
constructs, whatever they might be.

There are of course, different levels of semantic in something as complex
as a XML document, with the semantics of the base syntactic constructs
implicitly or explicitly defined by the definition of the syntax itself
(e.g., "abstract thing of type 'x' is represented syntactically by the
character string 'y'").  Because they come directly from the parsing of the
syntax, these things can be defined as completely as necessary to ensure
consistent interpretation (e.g., we all pretty much agree on what an XML
element is).  Thus we can agree on things like the SGML property set, the
DOM, and other abstract or functional interpretations of the syntax of XML
documents.

Syntactic validation is always easy and uncontroversial (to the degree that
the syntactic rules are understood, which is another issue entirely):
either the string matches the rules or it doesn't.  This is the *only* type
of validation that XML and SGML can provide (by which I mean, validation
against the base rules and any further rules provided by a set of DTD
declarations, which are, of course, also syntactic rules).

Architectural validation gets you slightly closer to semantic validation
only because the syntactic rules defined by an architectural DTD are
controlled independ of the document that claims to conform to them so that
the author of a document cannot simply change the declarations to make
their instance conform. Failure to meet the *syntactic* requirements of an
architecture probably means that some semantic rule has also been violated
(but not necessarily--the syntactic constraint that was violated may be
arbitrary or accidental and have nothing to do with any semantics of the
architecture).

It is at the next level up the processing chain (from the parser to the
"application-specific processor") that things become much fuzzier because
you have transcended pure syntax and moved into abstraction, at which point
you are working almost purely in the "semantic" realm, where what happens
and its correctness are largely matters of opinion.  It is at this level
that things like architectures (e.g., Docbook, HyTime, HTML, RDF) operate
semantically (but not syntactically--their syntactic influence ended once
the document was parsed into some in-memory representation). Validation is
much more difficult because it is validation of rules and policies that may
require the application of complex heuristics (possibly including aesthetic
judgement) to properly evaluate.  That is why at the end of the day the
only possibly complete and reliable form of semantic definition is prose
and the only reliable semantic validators are other humans.  And even then,
confusion and error are guaranteed because humans are, well, human.

Thus "semantics" can be usefully defined to mean "that which is not pure
syntax", which is not necessarily a useful definition but it does satisfy
the question as asked.

It's probably more useful to define clearer categories of semantics, such as:
- Presentation semantics (how should the data look when presented in a
particular context for a particular purpose?)
- Rhetorical semantics (how should the information be interpreted by a
human (or other sentient) observer?)
- Transformation or extraction semantics (how should the data be mapped to
other representations?)
- Processing semantics (how does the information relate to the process that
are applied to it (and by "process" I mean business processes, not just
processing programs)?)

The ultimate problem of course is that Humpty Dumpty rules in XML land: any
element can mean exactly what you want it to mean, regardless of what the
original author intended it to mean.  That is the beauty of generalized
markup and the curse of generalized markup.

Embrace the chaos.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)