bibiography dtd?

Len Bullard cbullard at hiwaay.net
Wed Nov 10 02:10:30 GMT 1999


Gorden-Ozgul, Patricia E wrote:
> 
> This may seem like a dumb question, but I am new to this 'document
> processing' field.

Len writes to Warren?

> . How should an XML DTD designer signify that this DTD is a
> . conformant subset or a variant of a another DTD?  Are
> . these one, two, or three namespaces IF a namespace identifier
> . resolves to a schema?  This matters if the FPI is ROA for the
> . DTD and a label for the namespace identifier.

Is the FormalPublicIdentifier the legal name of the namespace? 
Outermost parentheses in the oldTongue?  ROA:  record of 
authority. dominant namespace for aggregate.  If in contract, 
I must cite the record of authority for the defined item, 
I require it to be a singleton.  No Bifurcation at Root.

Sorry, mad'm, if the thread avoided your question.  
The crazies say, "Welcome!"

XML: Eight Noble Concepts (it's all about names)

1.  Markup:    Trees of names.
2.  Hyperness: Locations are points in space or time.
3.  Identity:  Locations bound to names enable persistence and
uniqueness.
4.  Systems:   Identities bound into a namespace.
5.  Schemas:   Systems to bind identities
6.  Mapping:   schemas whose trees are related by n-dimensional bindings
7.  N-dimensional binding:  a named vector of the schemas that produces
an intersection space 
8.  Facts:     the named values within the intersections

I won't go into that. :-)

Your problem:

> I have an industry-provided bibliographic DTD to which I need to apply data
> from Word documents.  Other than a manual solution (clerical cut/paste from
> Word doc to DTD doc) how would one create the DTD ASCII file for the data
> exchange.

You want to map the names 

o  in a source namespace (RTF) 

o  to a target namespace (DTD).

The source is the collection of instances, or documentation of instances 
to be transformed to the target namespace.  Create a table where 
the names in the source definition are mapped to the target.

DTD = Target namespace.  You are transforming it to this target.  The
DTD 
describes a tree of names.   Its just a tree of named things.  
Look at the TreeView object you use every day in many 
applications, and that is a good geometric model for what 
XML elements/attributes (trees of names) model. 

DOC = source collection (word does not have a DTD.  You 
must use an export format and figure out which one you 
want to work with.  some choices here are the HTML saveAs or the RTF 
(RichTextFile - doc's native format for all practical purposes).

o  If you do not have it, download the RTF spec

The export format with the most information is also the one that is
hardest 
to use:  RTF.   The RTF namespace is complex, but it is documented and 
reasonably regular.  The problem is working out what in that namespace
matches the 
names in the DTD.  

Eg, how do you get <par to become <p>?   If you 
don't have an RTF book, use the rtf saveAs, open that file in ASCII, 
then use the replace command of an editor like Professional File Editor, 
PFE, to substiture \n and the <par with \n<par.   It matches strings 
inside braces, so, use a substition of \n and the brace to separate 
sections.  

Do that with all of the names provided in the RTF namespace.  Looking
them 
up in the RTF spec, you will find these are the attributes that are
setting things like 
bold, font name, and so on.  You use that information to figure out what 
the format identifies in the namespace of the TARGET DTD.  Sorry, but 
because of the way this works, and because the SaveAs RTF feature 
produces such badly formattted data for reuse, you get to work awhile 
at this because you are actually tagging someone elses style and 
using their choices to infer names in the target.  The bad news, 
this is an inconsistent source of sources;  the good news, for practical 
purposes, people are reasonably consistent about how they do this.

So.. slug work for the conversionHead.

Or find some shareware or freeware that preformats RTF for conversion. 
My guess is, XSLT can be used to build this now. :-)

The easy manual way is to use the SaveAsHTML and map the HTML to the 
target DTD directly.  Why?  Depending on the consistency and 
application of the style, the productions are regular enough 
in the HTML source  to capture most of the important information 
for a downtranslation with restore.  By restore, the application 
of the target system restores the lost information.  Up translation 
is usually a little lossy, but not excessively so in this case.  
The truth is, if the source is doc, most of the important information 
is in the headers.  Get the text nodes out of the formatting 
goop, and you will have most of what you need.  What you get 
will look a lot like... SaveAsHTML.  Bare, but a simple enough 
subset of HTML that mapping back up is easy because the productions 
are regular.

The analysis of the RTF won't yield enough information to make that 
mapping a lot more useful if page fidelity is not an issue.  If page 
fidelity is still an issue, you have to analyze the RTF to get a 
closeEnoughForLegalWord fidelity.  Otherwise, just map the easily 
recognized structures (what SaveASHTML actually produces) and 
clean up a bit afterwards.  Map the HTML structures/names, which you 
know, to the matching structures/names (if any) of the target DTD. 
If they don't match lexically, (p Is p), you work out the match 
semantically (P isA p).  For a biblio DTD, that should be really 
straightforward.  You may have to track down some authors to get 
the information Word puts in those nice sets of global doc 
attributes that no one uses.

I hope this helps.  If not, ask more questions.  This list still 
answers questions best it can.

peace,

len


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list