XML and Internationalization...

Deke Smith deke at tallent.com
Mon Nov 9 18:45:43 GMT 1998


david at megginson.com, david at megginson.com said on 11/9/98 8:54 AM:

>One important point to note is that by itself, the 'xml:lang'
>attribute simply indicates the language of the content and attribute
>values -- it does not suggest that sibling elements with different
>xml:lang values either are or are not equivalents in other languages.
>For example, I could have
>
><itinerary>
>  <city xml:lang="fr">Montr&#233;al</city>
>  <city xml:lang="en">London</city>
>  <city xml:lang="it">Roma</city>
></itinerary>
>


Tony Graham, tgraham at mulberrytech.com said on 11/9/98 10:09 AM:

>While you may find concepts from the TMX work that are useful to you,
>TMX stands for Translation Memory eXchange, and is concerned with
>importing and exporting portions of translation memory -- phrases that
>have been translated once and saved so they don't need to be
>translated again -- between translation tools.  TMX has structures for
>parallel portions of text in multiple languages, but there is no
>concept that these chunks of text can, should, or will string together
>to make a coherent "document", in anybody's sense of the word.  The
>only markup in a TMX document, which is in XML, is concerned with
>delimiting and identifying the parallel chunks of text for the
>purposes of the translation tool: other markup from the source
>document may be saved in the TMX document (with significant XML
>characters escaped with entities) but only as a translation aid for
>those tools that can use it.

I have created phrase "substitution" scripts in Frontier and XML and ran into the same problem. I wanted to be able to "translate" phrases or words for use in multi-lingual Websites. It translates in the roughest sense: "Hello World!"=="¡Hola Mundo!"=="¡Bonjour Monde!".

I created my own translation DTD (I don't know of simple ones that may exist) -- and I think it shows, as David pointed out, that XML only provides a framework and the processing program has to provide an additional amount of structure not found in the DTD.

Under my dirty little DTD (built by necessity), the "Hello World!" example would be:

<PHRASE ID="Hello World!" xml:lang="en">
     <TRANSLATION xml:lang="fr">
          ¡Bonjour Monde!
     </TRANSLATION>
     <TRANSLATION xml:lang="es">
          ¡Hola Mundo!
     </TRANSLATION>
     <TRANSLATION xml:lang="de">
          Hallo Welt!
     </TRANSLATION>
</PHRASE>

This is a private DTD, so in my little world I know that the ID attribute of the PHRASE element equals the text nodes of the TRANSLATION elements. It would be asking too much of XML to enforce this structure.

TMX does provide this sort of function and structure, doesn't it? 

Here is how I would translate the previous example in TMX:

<?xml version="1.0?">
<!DOCTYPE tmx SYSTEM "http://www.lisa.org/tmx/tmx11.dtd">
<tmx version="1.1">
	<header
		creationtool="UserLand Frontier"
		creationtoolversion="5.1.4"
		datatype="PlainText"
		segtype="phrase"
		adminlang="en-us"
		srclang="EN"
		o-tmf="Frontier"
		o-encoding="MACINTOSH">
	</header>
	<body>
		<tu>
			<tuv lang="EN" creationid="BUZU">
				<seg>Hello world!</seg>
				</tuv>
			<tuv lang="FR" creationid="BUZU">
				<seg>¡Bonjour Monde!</seg>
				</tuv>
			<tuv lang="ES" creationid="BUZU">
				<seg>¡Hola Mundo!</seg>
				</tuv>
			<tuv lang="DE" creationid="BUZU">
				<seg>Hallo Welt!</seg>
				</tuv>
			</tu>
		</body>
</tmx>

Here's my question: 

As I understand it, TMX is a format for translation "dictionaries" -- or lists of equivalent words, phrases, sentences or paragraphs in different languages. TMX also allows the preservation of formating within phrases, such as boldface, italic, etc.

I always judge tools by what *I* need from them and that is what I need from TMX. Is it meant to do more than what I have asked it to do? Is this "dictionary" concept something TMX is *meant* for?

I am under the impression that TMX can also have embedded "macros" within phrases. By "macro", I mean processing commands that may be understood only by a specific scripting language. Am I right?

Deke

-----------------------------------------------------------------
Deke Smith
Tallent Communications Group, Brentwood TN
deke at tallent.com, 615-661-9878
-----------------------------------------------------------------
" The best way to predict the future is to invent it. " 
       - Alan Kay 



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list