Questions on DCD

Dean Roddey roddey at us.ibm.com
Fri Oct 2 20:31:42 BST 1998


>>6) I know this is not going to make me popular, but I think that there
>>are too many datatypes
>
>I do too, and I've warned my co-editors to expect massive amputations
>in the committee process, if DCD ever gets taken up.
>

I think everyone thinks that this is the case. The problem is that if you have
too few, then many applications will require ad hoc and inconsistent extensions
of the mechanism. Too many and you bog down all processors with having to deal
with lots of types that most people don't use (and you still don't cover all
the bases.)

I made a proposal related to the whole type validation mechanism, which really
extends beyond DCD but would address a lot of its issues as well as allow for
much more extensibility. I'd like to post it here for comment (with some
internal information removed) in order to perhaps bash it out as a general type
validation mechanism. However, right not its in a Notes database and if I post
it here its going to look so horrible that it will probably be unreadable.

But, just in case, here it is. I'd appreciate any comments on it (if you can
read it.) Posting stuff to this mailing list from Notes generally seems to
totally destroy its format.

Overview
This document is related to the DCD proposal made to the W3C, and more
specifically the DCD 'constrain mechanism'. DCD provides a means for indicating
constraints on the values of elements and attributes. This mechanism is
provided via the Min/MinExclusive and Max/MaxExclusive properties, and has
'falls within a single range' semantics. In other words each element or
attribute definition can express a value range (inclusive or exclusive) within
which the value of each instance of that element or attribute in the target XML
file must fit, in order to meet the constraints.
The purpose of this document is to propose an alternative constraint mechanism
which we feel is no more complex and far more flexible than the one current
proposed.
Just to provide a refresher course for the existing DCD constraint proposal,
here is an example snippet from a DCD which defines an attribute which is of
type int and which has a range of 1 to 10, inclusive.
<AttributeDef Name="Foo" Datatype="int" Min="1" Max="10"/>
The same mechanism applies to element definitions as well. The content of the
min/max values must of course make sense for the declared data type of the
element or attribute.
After doing a quick and dirty demonstration program of DCD (based on the
existing functionality in the XML4J parser), the XML Team at JTC-SV would like
to put forward a proposal which we feel makes the constraint mechanism more
useful and extensible, without placing an undue burden on the common case of a
simple document with simple validation needs (i.e. not constraints required,
just structural validation.)
The overall goals of this constraint mechanism are:
Minimum code support requirements in the core parser architecture (i.e. minimal
cost for those who don't use it)
Reasonable implementation size and complexity
Open endedness and flexibility for the uncommon case and user
Simplicity of understanding and use for the common case and user
We feel that a constraint mechanism is probably achievable which meets these
requirements. As the likely targets of such a proposal, we obviously do not
want to propose something which is not achievable and maintainable with
reasonable effort, so we certainly hope not to contribute to the growing
perception that 'deep thoughts' in the XML world are out of hand, and real
world implementation is suffering for it.
Driving Forces
The primary driving force of this proposal is a belief that the constraint
mechanism currently expressed in the DCD proposal is insufficient to meet more
than a small fraction of the needs of the possibly quite wide target audience.
We understand the reasoning behind this initial proposal, i.e. to maintain a
level of simplicity that would increase the likelihood of acceptance and
implementation; however, we feel that the current mechanism is sufficiently
limited that its implementation might be counterproductive. The reasoning is
that almost any real world application of the technology would require some
amount of manual extension. Such extensions are not possible within the
existing specification, and hence would almost certainly be implemented in a
haphazard way, hindering interoperability of implementations.
Also, since any such haphazard extensions have the potential of becoming
defacto standards, we would like to avoid having such 'design by aggregation'
imposed upon us by the marketplace. By providing a more extensible mechanism up
front, we would hope to avoid this scenario, since any reasonable extension of
the mechanism could be made without stepping outside the system provided.
And thirdly, though obviously useful, the limited constraints expressable in
the existing system does not seem sufficient enough to warrant the effort of
implementing a constraint mechanism in the parser. Such a mechanism is
non-trivial and imposes some mimimum of unavoidable overhead on the parser. For
such an effort to be made and such a performance burden to be accepted, we
would very much prefer to achieve more powerful constraint checking for our
buck.
The Basic Concept
Our concept is based loosely upon the existing experience of spreadsheets,
which are probably the prototypical example of simple 'application development'
for the end user. In particular, the 'function' concept of the spreadsheet,
which provides an easy to understand mechanism for doing simple arithmetic and
logic operations. These functions are in the form of a simple function call
which evaluates its parameters and returns a boolean pass/fail result.
So, at its simplest level, a constraint expression would look something like
this:
<AttributeDef Name="Foo" Datatype="int" Constraint="IsInRange(1,10)" />
In this scenario, a "Constraint" property is introduced. Its value is a string
which expresses some constraint by way of a 'function syntax' expression. In
this case the function is "InRange" and it takes two values, the minimum and
maximum values of the range. All constraints will be of this form.
High Level Implementation
The implementation of this proposed validation scheme is relatively
straightforward. It can be delivered in three conceptual layers, each of which
provides increasing levels of sophistication for increasing levels of effort
and coding skill. These layers will be discussed here in detail, as well as how
those layers can be fit together and 'delivered'.
Intrinsic Functions
At the core of the validation system there will be a set of intrinsic
functions, which are provided with the parser implementation, and which should
be required in any DCD implementation by the specification. This will insure
interoperability of core validation services. These functions will be selected
for their high 'bang for the overhead buck' appeal, i.e. they will meet
hopefully 90% of the common case needs with minimal overhead (since they will
be packaged with the parser core.)
A likely set of core functions would be:
Name                    Example
IsEqualTo  Constraint="IsEqualTo(5.0)"
IsGreaterThan  Constraint="IsGreaterThan(&BaseLevel;)"
IsLessThan  Constraint="IsLessThan(25)"
IsInRange  Constraint="IsInRange(&ValidRange;)"
IsOneOf  Constraint="IsOneOf(Blue, Red, Pink)"
IsTrue   Constraint="IsTrue()"
IsFalse   Constraint="IsFalse()"
IsEven, IsOdd  Constraint="IsEven()"
IsMultipleOf  Constraint="IsMultipleOf(255)"
IsInMultiRange  Constraint="IsInMultiRange(1-10, 90-100)"
IsStrEqualTo  Constraint="!IsStrEqualTo('We the People')"
IsDigit, IsChar, etc... Constraint="IsHexDigit()"
And, Or, Xor  Constraint="And(IsInRange(1, 90), !IsMultipleOf(5))"

This set of functions should meet the needs of quite a wide range of
applications, though there might be a couple more fundamental ones that could
or should be added. Though the semantics of these are quite obvious, a little
discussion of the finer points is presented before we move on.
First of all, notice how these functions leverage the power of general
entities, by allowing flexible replacement of function parameters. This
capability will provide a lot of power to modify the validation over time
without changing the DCD itself. This is not in an of itself an improvement
over the existing validation scheme, since entity replacement is inherent to
XML; however, the more expressive the validation mechanism, the more leverage
is gained.
Secondly, note the second to the last line, which describes the 'character
type' functions. These can be mapped pretty directly to the language support
for such things, and will provide a nice way to check a lot of characteristics
of single character fields. There are language and locale issues involved here,
which will be discussed at the end of this document.
Also, note the last line which defines some boolean logic functions. These can
be intrinsically handled by the processor itself, and will support much more
complex constraints built from more basic ones. As long as we limit the nesting
to something reasonable such as a single level, the complexity of these
functions will be quite small. They will merely be a recursive container and
invoker of other functions, with a little evaluation of the boolean results of
each one. Though the example shows two parameters, there is no reason why it
cannot easily allow an open ended number of subexpression parameters.
Negation is implemented by the '!' prefix before a function, as in the last
line where the function checks that the value is both in the range 1..90
inclusive and is not a multiple 5. This provides a lot of flexibility and
avoids the need for having explicit Not versions of functions, and the
implementation of it is ultra trivial. In the IsStrEqualTo() example checks
that the value is not equal to "We the People".
The amount of code to implement these intrinsic functions, above and beyond the
basic amount of instructure required to support constraint checking at all, is
very trivial. Most of them will resolve to singe lines of evaluation code.
To insure openness, the function mechanism will probably be based on the
namespace proposal as well. So, in reality, the above functions would actually
be part of a "Htpp://W3C.Constraints/DCDStd" namespace for instance. This will
allow a convenient partitioning of the function namespace, as well as a very
flexible way of providing alternative processing by just mapping the namespace
prefix to another URI that maps to a different set of functions!
Third Party Functions
The next level of support would be the ability to plug in third party
validation functions. This would open up the system considerably by providing a
well defined delivery mechanism for functions, to which third parties could
write. As long as these functions can be expressed with the simple function
syntax described above, they can be as complex as the developer wishes them to
be and the user is willing to deal with.
Support for third party functions requires a well defined interface to which
they can be developed. This required interface is actually quite simple and
convenient, and will have very few semantic demands to be met. The very simple
semantics insures that open endedness is not compromised by the interface. A
proposed interface is described below.
Custom Functions
At the upper end of the spectrum are custom applications which would provide
their own functions for doing very domain specific constraint validation. These
could include PIN number validation, database lookup of names or ids or social
security numbers, and on and on. Our proposal provides a flexible back door for
the validation mechanism to accomodate the most complex imagineable validation,
without increasing the overhead of the common case by a single CPU cycle.
Though there is no limit to the complexity or sophistication that these custom
functions could achieve, there are no implementation issues here which go
beyond those of the third party function development scenario discussed above,
at least from our perspective as the parser provider.
Implementation Details
This section puts forward a specific example implementation that we believe
will meet all of the requirements and fulfill all of the promise of the
proposed system. Example Java implementations are presented, but the
implementation would be easily done in C++ or any other quality object oriented
language.
The Function Interface
A function is represented in the implementation as a simple abstract interface
class. The interface is extremely simple, but allows the system to manage them
and invoke them generically and reasonably efficiently. For this discusion, the
interface is called ValFunction.
A concrete implementation of it would look something like this in Java. This
very simple class would allow the functions to be managed and invoked very
simply and easily. Of course this is not a very complex example, and could be
achieved by way of an intrinsic IsLessThan() function, but it shows how one
would implement  a simple function class.
class ValidSalary implements ValFunction
{
    // Default ctor only because they are factory created
    ValidSalary()
    {
    }

    // 'Parsing' method
    public void Parse(String[] astrParams)
    {
        // We only take one function param of maximum salary
        if (astrParams.length != 1)
            throw SomeError();

        // Try to convert to our max salary member
        fMaxSal = new Double(astrParams[0]).doubleValue();

        // Format our constraints into the description string
        strDesc = new String("< " + fMaxSal);
    }

    // Evaluation method
    public boolean bEvaluate(String strValue)
    {
        // Convert the string to a double and compare to max
        double fTmp = new Double(strValue).doubleValue();
        return (fTmp < fMaxSal);
    }

    // Reporting method for errors
    public String strConstrainDesc()
    {
        return strDesc;
    }

    // Private data
    double fMaxSal = 0;
};

The constructor is a default since functions will generally be 'factory
created'. However, the factory can certainly invoke them with particular
parameters. More on this below in the "Function Bundle Interface" section.
The Parse() method is called once during the evaluation of an element or
attribute which declares a constraint that uses the function. The contents of
the function (the stuff after the function name, i.e. inside the function's
parenthesis) is passed to the parser method in an array of strings which
represent the comma separated function parameters. The function will evaluate
these parameters, which represent the validity constraints set up for the
element or attribute, and store that information in some (hopefully) optimal
internal format. In the example above, which validates maximum salaries, it
converts the single parameter to a double and stores that for later use



More information about the Xml-dev mailing list