[RkWard-devel] RKWard and other encondings
Thomas Friedrichsmeier
thomas.friedrichsmeier at ruhr-uni-bochum.de
Sun Feb 11 15:46:05 UTC 2007
Hi Ilias,
On Saturday 10 February 2007 10:42, I. Soumpasis wrote:
> The past week I came to a problem regarding R and Greek Language. The short
> story is that although user can do everything in Greek language on a
> terminal running R (in utf8) he cannot plot and print plots (problem with
> greek letters). See a picture here
> http://users.forthnet.gr/the/isoumpasis/data/Screenshot.png
there is some info on fonts and encodings in ?x11. I don't know, whether this
may help.
> I never came to a problem like this before because I am aware of problems
> of Greek language and I keep my data in English. However I came to a
> solution which I am writing for the greek e-magazine of greek users of
> Linux. Here ( http://users.forthnet.gr/the/isoumpasis/data/example.png) is
> a screenshoot after the solution. (We are discussing writing an article
> also on RKWard).
>
> Well, let's come to main point. For the above to work, user must work in R
> in ISO-8859-7. That's the reason I had to play with rk.temp.convert to make
> it convert to this encoding as I said to a previous mail on another thread.
Currently, in R SVN, Prof. Ripley seems to be doing some work on encodings.
Among other things, apparently character vectors will gain information on at
least whether they are in a latin1 or in a UTF-8 encoding.
I don't know, whether that will be very helpful, or which issues it will fix
(admittedly, I don't understand too much about encodings and the associated
problems). However, there does seem to be some activity in that region.
Perhaps you could try the R SVN tree, to see if / what has been addressed.
I think it will certainly be worth while to provide access to
rk.temp.convert-like functionality outside of the SPSS import plugin (we will
need a whole class of plugins that do various conversions on data, one of
those being character encoding conversion). I don't know, whether it would be
reasonable (or even possible) to try to make users use a specific encoding in
RKWard, though.
> Moreover and I find this critical enough, databases like MySQL use various
> encodings, among which ISO. I do not know if there is a mechanism to
> convert the enconding when connecting to a database and pulling data but I
> believe there is not. I believe that in the future RKWard if it is to get
> into business should develop this functionality, and I mean connect to a
> database and pull data. But different encodings should create problems.
I don't know, what you use for database access. I'm vaguely aware that several
solutions for database connections in R exist, but I've never tried any of
this, so far. However, it should be fairly trivial to add encoding conversion
to a database connection. It would basically be a matter of adding
information about the source encoding (i.e. the encoding in the database) to
the connection, and then using some calls to iconv() whenever reading or
writing data.
> For the above reasons I believe that RKWard should support more encodings.
> Is it feasible? And is it possible?
Depends on what you mean by "should support more encodings". Sure,
rk.temp.convert could easily be extended to list more known encodings (but
it's not easily practically possible to list all known encodings, so I think
we should limit it to the most common ones, and users with a more exotic
encoding will need to enter theirs manually). What is not feasible, is to
cover up for all encoding deficiencies in R. I can't imagine right now, what
we could do beyond giving the user an easy possibility to convert data
between different encodings. Do you have something more specific in mind?
Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20070211/c5e29707/attachment.sig>
More information about the Rkward-devel
mailing list