Internal subset equivalent in new schema proposals?

Wed Dec 2 07:53:01 GMT 1998

Joel Bender <joel at spooky.emcs.cornell.edu> writes:

> I was thinking along similar lines.  I've been adding something like this
> to my XML documents:

> 	<prop name="state" xml:regexp="[A-Z]+">NY</prop>

It's a neat way of doing it, since checking is optional and
transparent to non-checking applications.

> So the parser can verify that the CDATA matches the regular expression.
> Works OK for content, but I don't see how I can add this meta-meta-data for
> attributes.

The dividing line between attributes and elements is a fine one,
anyway.  Is it a real restriction to have the user embed constrained
information content in elements and not attributes?  E.g.

	<prop>
	  <name xml:regexp="(state|county|city)">state</name>
	  <prop-content...> </..>
        </prop>

or perhaps rather

	<prop>
	   <!-- one of state, county, city -->
	   <state xml:regexp="[A-Z]+">NY</state>
	<prop>

>  That is to say, how can I tell the parser that the 'name'
> attribute value for the 'prop' entity must be of the form
> "[a-zA-Z_][0-9a-zA-Z_]*"?

Not to mention the form of the xml:regexp attribute, eh? :-)

Actually, that *is* a problem, since as a DTD designer, I want to
express the lexical data formats my applications handle, I wouldn't
want to leave this to document authors, who probably know more about
technical writing, and less about the technical limitations of the
application software.

By the way, you *can* check attributes by doing

	<prop name="state" name.regexp=""[a-zA-Z_][0-9a-zA-Z_]*"...>

or something, can't you?

> Of course this also brings up the murky waters of grep syntax, which I've
> been avoiding.

Well, looking back, I realize I consider regular expressions a simple
solution.  Looking further back, I realize that this is because of a
long and shady past of juggling Unix shell scripts.

On the other hand, regular expressions are very powerful, and you
don't really need to know all the ins and outs to write simple ones,
like "[A-Z]" or "(one|two|three)".  And many of the special characters 
are used in DTDs already (+*?).

~kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)