[RkWard-devel] RKWard and other encondings

Sun Feb 11 15:46:05 UTC 2007

Hi Ilias,

On Saturday 10 February 2007 10:42, I. Soumpasis wrote:
> The past week I came to a problem regarding R and Greek Language. The short
> story is that although user can do everything in Greek language on a
> terminal running R (in utf8) he cannot plot and print plots (problem with
> greek letters). See a picture here
> http://users.forthnet.gr/the/isoumpasis/data/Screenshot.png

there is some info on fonts and encodings in ?x11. I don't know, whether this 
may help.

> I never came to a problem like this before because I am aware of problems
> of Greek language and I keep my data in English. However I came to a
> solution which I am writing for the greek e-magazine of greek users of
> Linux. Here ( http://users.forthnet.gr/the/isoumpasis/data/example.png) is
> a screenshoot after the solution. (We are discussing writing an article
> also on RKWard).
>
> Well, let's come to main point. For the above to work, user must work in R
> in ISO-8859-7. That's the reason I had to play with rk.temp.convert to make
> it convert to this encoding as I said to a previous mail on another thread.

Currently, in R SVN, Prof. Ripley seems to be doing some work on encodings. 
Among other things, apparently character vectors will gain information on at 
least whether they are in a latin1 or in a UTF-8 encoding.

I don't know, whether that will be very helpful, or which issues it will fix 
(admittedly, I don't understand too much about encodings and the associated 
problems). However, there does seem to be some activity in that region. 
Perhaps you could try the R SVN tree, to see if / what has been addressed.

I think it will certainly be worth while to provide access to 
rk.temp.convert-like functionality outside of the SPSS import plugin (we will 
need a whole class of plugins that do various conversions on data, one of 
those being character encoding conversion). I don't know, whether it would be 
reasonable (or even possible) to try to make users use a specific encoding in 
RKWard, though.

> Moreover and I find this critical enough, databases like MySQL use various
> encodings, among which ISO. I do not know if there is a mechanism to
> convert the enconding when connecting to a database and pulling data but I
> believe there is not. I believe that in the future RKWard if it is to get
> into business should develop this functionality, and I mean connect to a
> database and pull data. But different encodings should create problems.

I don't know, what you use for database access. I'm vaguely aware that several 
solutions for database connections in R exist, but I've never tried any of 
this, so far. However, it should be fairly trivial to add encoding conversion 
to a database connection. It would basically be a matter of adding 
information about the source encoding (i.e. the encoding in the database) to 
the connection, and then using some calls to iconv() whenever reading or 
writing data.

> For the above reasons I believe that RKWard should support more encodings.
> Is it feasible? And is it possible?

Depends on what you mean by "should support more encodings". Sure, 
rk.temp.convert could easily be extended to list more known encodings (but 
it's not easily practically possible to list all known encodings, so I think 
we should limit it to the most common ones, and users with a more exotic 
encoding will need to enter theirs manually). What is not feasible, is to 
cover up for all encoding deficiencies in R. I can't imagine right now, what 
we could do beyond giving the user an easy possibility to convert data 
between different encodings. Do you have something more specific in mind?

Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20070211/c5e29707/attachment.sig>