[Kalzium] Kalziums datastructure and Blue Obelisk repository

Geoffrey Hutchison grh25 at cornell.edu
Tue Oct 4 22:01:35 CEST 2005


> However, if we want to make this a standard, we might want to take  
> the time to
> do it right, now that we have the experience.  There is a very simple
> principle when designing XML schemas:

So far, the Blue Obelisk repository has tried to sick to CML (and  
STMML) elements. This is not required, although it seems like a  
better idea to stick to an existing schema and our own namespace and  
dictionary than to create a whole new XML schema.

(And if we encounter items which do not fit well in CML or STMML,  
perhaps we can push for them to be added to the standard. :-)

> One thing that we should take special care of is the issue of
> internationalization and localization.
> ...
> I don't know how if there is an established way of doing this for  
> XML files,
> but I think we should be careful.  One way to do it would be to  
> split the XML
> file in two parts: the generic one and the i18n:ed one.  That is  
> probably not
> so easy to get right.

After doing some searching, there's an attribute xml:lang="en" or  
xml:lang="ja" for specifying the language of a particular piece of text.

http://www.opentag.com/xfaq_lang.htm#lang_xmllang

But this same FAQ also mentions:
> Instead, use one document per language, at least for the material  
> you send to the localizer. If needed, after translation, you can  
> group all entries in a single file, but treat that step as a  
> "compilation-like" step to be done after localization.

Since the current repository XML is done in a "compilation-like step"  
via Perl scripts, this should be pretty easy.

Right now, the Blue Obelisk elements.xml is generated by a Perl  
script from the composite text files. I think this might be the  
easiest way to continue -- we can easily change the syntax by editing  
the scripts, and AFAIK it can be difficult to merge the XML files  
automatically.

So perhaps we can start to split out the Kalzium properties into  
separate files -- and any bit of text for translation can be set in a  
file with a given language code? For example element-name-en.txt,  
element-name-fr.txt...

If this sounds good, I can help with splitting out the property  
files, although my schedule is pretty busy for the next week or so.

Cheers,
-Geoff


More information about the Kalzium mailing list