new msxml behaviour

Tue Nov 4 23:43:22 GMT 1997

the new msxml contains this code within getText of ElementImpl which
changes the behaviour of entity expansions from the previous version:

for (Enumeration en = children.elements(); en.hasMoreElements(); )
{
	if (sb.length() > 0)
       	sb.append(' ');
       sb.append(((Element)en.nextElement()).getText());
}

return sb.toString();

notice the appending of a space. is this appropriate? it means constructs
like 'abc&SOME.ENTITY;def' expand to 'abc SOME.ENTITY.CONTENTS def' rather
than  'abcSOME.ENTITY.CONTENTSdef' like it used to which really stuffs my
application.

the last time i tried to 'improve' msxml the damn thing proved incredibly
difficult to recompile due to some ridiculuous circular dependancies among
the files - god knows how ms compiled it in the first place - anyway i now
see this awful dll rubbish in there so before i attempt to make a makefile
(*please* supply one next time ms :) ) is this in fact a problem? or should
i change my approach.

i was also wondering whether defaults for attributes should appear to the
application if the attribute isn't explicitly given in the markup. right
now i've added a function to traverse the tree and insert all attribute
defaults (if needed) before i start processing the document - what do you
think of that?

the msxml api was awful for getting schema information such as default
values. now it has a 'toSchema' function which returns an element with
child elements for each attribute. the child element's tag is 'ATTRIBUTE'
and it contains attributes such as 'XML:ID' containing the attribute name
and 'XML:DEFAULT' containing the default, for instance. this is an
incredibly convoluted method for accessing such information - are there any
other xml parsers out there that attach schema information to the markup
element itself - like element.getAttribute("xyz").getDefaultValue() rather
than
document.getElementDecl("abc").getChild("xyz").getAttribute("XML:DEFAULT").

finally (i've been saving up questions) i'd like this construct to be
parsed as a <bar> element...

<!ENTITY foo '<![CDATA[ 
  <bar>blah blah</bar>
]]>'>

...

&foo;

but instead, &foo; is processed as PCDATA (by msxml). is this correct
behaviour? section 4.4 of the xml ref contains the following: '6.For an
internal (text) entity, the processor must include the entity; that is,
retrieve its replacement text and process it as a part of the document
(i.e. as content or AttValue, whichever was being processed when the
reference was recognized), passing the result to the application in place
of the reference. The replacement text may contain both text and markup,
which must be recognized in the usual way...'

well i haven't received any messages from the list today (maybe you're all
in bed on US time) so how about chewing on that for me 'cos i must say it's
a pain to rewrite your code when you had to hack it in the first place to
work.

p.s. i'm using xml to define the syntax and byte data of a peer-to-peer
network interaction over pacnet and there aren't any PIs. if anyone would
like to check out what i've done i'd greatly appreciate any opinions.
---------------------------------------------------------------------
Iguana Information Services 		Ph    +64 4 499 9782
PO Box 10 609				Fax   +64 4 499 4439
Wellington				Email scott at iguana.co.nz
New Zealand				HTTP  http://www.iguana.co.nz

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)