[rkward-users] rkward console encoding problem on Widows

Donatas Glodenis dg at lapas.info
Tue Nov 11 09:11:29 UTC 2014


Hi Thomas, thanks for all your work in trying to analyze this problem, it
seems it is at least partly in vain, though. I just figured out the problem
affects only the summary command (describe, table, levels - all print
fine). But below I also did quite a lot of work (before I figured the
problem out), may be also in vain, but maybe some of it you will find
interesting/useful.

I did not manage to figure out why the summary program prints it wrong in
Rkward. As I said before, it does print correctly in the plain rgui.exe
console.

Sincerely

Donatas

2014-11-10 18:39 GMT+02:00 Thomas Friedrichsmeier <
thomas.friedrichsmeier at ruhr-uni-bochum.de>:

>
>
> I now tried in Windows 7, setting Lithunian localization. I get the same
> code
> pages from Sys.getlocale(). Interestingly, when I try to enter any special
> chars, R just strips them off:
>
> > "Stačiatikių"
> [1] "Staciatikiu"
>
Same when entering in the data editor. They appear correctly on typing, but
> in
> R, only a stripped version is stored. Obviously that's not so great, but
> it's
> not the symptoms you're seeing, either. And, importantly, a plain R console
> does not even allow me to enter the special chars. They get stripped to the
> nearest ascii character while I'm typing.
>

In my case I can enter the secials characters both in Rkward console and in
R console (rgui.exe) fine. I am also on Windows 7, btw. I can also create a
data frame, enter special characters, and, surprise!, I they get printed
correctly in Rkward console!


> So follow-up question: How did you get those data into R in the first
> place?
>

Imported those from an spss file. The file, apparently, had an encoding
Windows-1257 (or maybe ISO-8859-13, which for most purposes is the same). I
could import it in Windows without specifying the import encoding - and it
displayed Lithuanian characters fine. But when I tried to do that in Linux
(Kubuntu 14.04, RKward 6.2) I had to specify the encoding ISO-8859-13,
otherwise the special characters were left out.

Here is the import code:

local({
## Prepare
require (foreign)
## Compute
data <- read.spss
("C:/Users/D.Glodenis/Programos/RKWard/workspaces/tm2007+nrtic2014/2014/Religija
2014 03.sav", to.data.frame=TRUE, max.value.labels=1000000)

# set variable labels for use in RKWard
labels <- attr (data, "variable.labels");
if (!is.null (labels)) {
    for (i in 1:length (labels)) {
        col <- make.names (names (labels[i]))
        if (!is.null (col)) {
            rk.set.label (data[[col]], labels[i])
        }
    }
}

.GlobalEnv$DATA14 <- data        # assign to globalenv()
rk.edit (.GlobalEnv$DATA14)
## Print result
rk.header("Import SPSS data", parameters=list("File",
"C:/Users/D.Glodenis/Programos/RKWard/workspaces/tm2007+nrtic2014/2014/Religija
2014 03.sav",
    "Import as", "DATA14"))
})

I changed the import encoding to "ISO8859-13", and it changed nothing.


> What does
>   Encoding (DATA14$R02)
> (or)
>   Encoding (levels (DATA$R02))
> print?
>

It is messy! another thing I noticed is, that the problem only appears with
one command - summary!:

> summary(DATA14$R02)
            ˙žKatalikų˙ž          ˙žStaĨiatikių˙ž
˙žSentikių˙ž
                 752                   24                    7
...
This is garbage as before. But look at this!

> levels(DATA14$R02)
 [1] "Katalikų"             "Stačiatikių"          "Sentikių"
.....

Now, Encodings:

> Encoding (levels (DATA14$R02))
 [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
 [8] "unknown" "unknown" "unknown" "unknown" "unknown"

I ran

> Encoding (levels (DATA14$R02))<-"ISO8859-13"
## encoding not changed, still "unknown", same garbage in summary() output
and:

> Encoding (levels (DATA14$R02))<-"UTF-8"

> Encoding (levels(DATA14$R02))
 [1] "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"
 [8] "unknown" "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"

> levels(DATA14$R02)
 [1] "˙žKatalik\xf8˙ž"                "˙žSta\xe8iatiki\xf8˙ž"
 [3] "˙žSentiki\xf8˙ž"                "˙žEvangelik\xf8 liuteron\xf8˙ž"
 [5] "˙žEvangelik\xf8 reformat\xf8˙ž" "˙žJud\xebj\xf8˙ž"
 [7] "˙žMusulmon\xf8˙ž"               "Kita "
 [9] "˙žNeatsak\xeb˙ž"                "˙žJehovos liudytoj\xf8˙ž"
[11] "˙žBaptist\xf8˙ž"                "˙žPagoni\xf8˙ž"

> summary(DATA14$R02)
               ˙žKatalik\xf8˙ž          ˙žSta\xe8iatiki\xf8˙ž
                       752                         24
               ˙žSentiki\xf8˙ž ˙žEvangelik\xf8 liuteron\xf8˙ž

- this is similar, but not identical, garbage as before

Changing encoding back (that is, using just any random string) turns the
data back into the previous form.

> Encoding (levels(DATA14$R02))<-"sdšsdš"


I did another experiment. Created an empty data frame "test", changed first
column var data type to string. Proceeded like this:

> test$var<-c("ąžerty","zūcvęfčm","wįpųlkjėų")
> Encoding(test$var)
[1] "unknown" "unknown" "unknown"
> Encoding(test$var)<- "ISO8859-13"
## I took the string ISO8859-13 from the code of spss import dialog
> Encoding(test$var)
[1] "unknown" "unknown" "unknown"
## strange, the iso encoding setting does not work; I also tried
"ISO885913" and "ISO-8859-13", "CP1257", "WINDOWS-1257" - no luck
> test$var
[1] "ąžerty"    "zūcvęfčm"  "wįpųlkjėų"
> Encoding(test$var)<- "UTF-8"
> test$var
[1] "˙ž\xe0\xfeerty˙ž"          "˙žz\xfbcv\xe6f\xe8m˙ž"
"˙žw\xe1p\xf8lkj\xeb\xf8˙ž"
> Encoding(test$var)
[1] "UTF-8" "UTF-8" "UTF-8"
# changing to UTF8 does work! also "latin1" works.


Another interesting bit is that I have absolutely no problem entering and
> showing German or French special chars in RKWard (but the R console does
> not
> allow me to enter them, either).
>


> What does
>> options("encoding")
> print?

> options("encoding")
$encoding
[1] "native.enc"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/rkward-users/attachments/20141111/062d27ff/attachment.html>


More information about the Rkward-users mailing list