[rkward-users] rkward console encoding problem on Widows
Donatas Glodenis
dg at lapas.info
Tue Nov 11 09:11:29 UTC 2014
Hi Thomas, thanks for all your work in trying to analyze this problem, it
seems it is at least partly in vain, though. I just figured out the problem
affects only the summary command (describe, table, levels - all print
fine). But below I also did quite a lot of work (before I figured the
problem out), may be also in vain, but maybe some of it you will find
interesting/useful.
I did not manage to figure out why the summary program prints it wrong in
Rkward. As I said before, it does print correctly in the plain rgui.exe
console.
Sincerely
Donatas
2014-11-10 18:39 GMT+02:00 Thomas Friedrichsmeier <
thomas.friedrichsmeier at ruhr-uni-bochum.de>:
>
>
> I now tried in Windows 7, setting Lithunian localization. I get the same
> code
> pages from Sys.getlocale(). Interestingly, when I try to enter any special
> chars, R just strips them off:
>
> > "Stačiatikių"
> [1] "Staciatikiu"
>
Same when entering in the data editor. They appear correctly on typing, but
> in
> R, only a stripped version is stored. Obviously that's not so great, but
> it's
> not the symptoms you're seeing, either. And, importantly, a plain R console
> does not even allow me to enter the special chars. They get stripped to the
> nearest ascii character while I'm typing.
>
In my case I can enter the secials characters both in Rkward console and in
R console (rgui.exe) fine. I am also on Windows 7, btw. I can also create a
data frame, enter special characters, and, surprise!, I they get printed
correctly in Rkward console!
> So follow-up question: How did you get those data into R in the first
> place?
>
Imported those from an spss file. The file, apparently, had an encoding
Windows-1257 (or maybe ISO-8859-13, which for most purposes is the same). I
could import it in Windows without specifying the import encoding - and it
displayed Lithuanian characters fine. But when I tried to do that in Linux
(Kubuntu 14.04, RKward 6.2) I had to specify the encoding ISO-8859-13,
otherwise the special characters were left out.
Here is the import code:
local({
## Prepare
require (foreign)
## Compute
data <- read.spss
("C:/Users/D.Glodenis/Programos/RKWard/workspaces/tm2007+nrtic2014/2014/Religija
2014 03.sav", to.data.frame=TRUE, max.value.labels=1000000)
# set variable labels for use in RKWard
labels <- attr (data, "variable.labels");
if (!is.null (labels)) {
for (i in 1:length (labels)) {
col <- make.names (names (labels[i]))
if (!is.null (col)) {
rk.set.label (data[[col]], labels[i])
}
}
}
.GlobalEnv$DATA14 <- data # assign to globalenv()
rk.edit (.GlobalEnv$DATA14)
## Print result
rk.header("Import SPSS data", parameters=list("File",
"C:/Users/D.Glodenis/Programos/RKWard/workspaces/tm2007+nrtic2014/2014/Religija
2014 03.sav",
"Import as", "DATA14"))
})
I changed the import encoding to "ISO8859-13", and it changed nothing.
> What does
> Encoding (DATA14$R02)
> (or)
> Encoding (levels (DATA$R02))
> print?
>
It is messy! another thing I noticed is, that the problem only appears with
one command - summary!:
> summary(DATA14$R02)
˙žKatalikų˙ž ˙žStaĨiatikių˙ž
˙žSentikių˙ž
752 24 7
...
This is garbage as before. But look at this!
> levels(DATA14$R02)
[1] "Katalikų" "Stačiatikių" "Sentikių"
.....
Now, Encodings:
> Encoding (levels (DATA14$R02))
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[8] "unknown" "unknown" "unknown" "unknown" "unknown"
I ran
> Encoding (levels (DATA14$R02))<-"ISO8859-13"
## encoding not changed, still "unknown", same garbage in summary() output
and:
> Encoding (levels (DATA14$R02))<-"UTF-8"
> Encoding (levels(DATA14$R02))
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
[8] "unknown" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
> levels(DATA14$R02)
[1] "˙žKatalik\xf8˙ž" "˙žSta\xe8iatiki\xf8˙ž"
[3] "˙žSentiki\xf8˙ž" "˙žEvangelik\xf8 liuteron\xf8˙ž"
[5] "˙žEvangelik\xf8 reformat\xf8˙ž" "˙žJud\xebj\xf8˙ž"
[7] "˙žMusulmon\xf8˙ž" "Kita "
[9] "˙žNeatsak\xeb˙ž" "˙žJehovos liudytoj\xf8˙ž"
[11] "˙žBaptist\xf8˙ž" "˙žPagoni\xf8˙ž"
> summary(DATA14$R02)
˙žKatalik\xf8˙ž ˙žSta\xe8iatiki\xf8˙ž
752 24
˙žSentiki\xf8˙ž ˙žEvangelik\xf8 liuteron\xf8˙ž
- this is similar, but not identical, garbage as before
Changing encoding back (that is, using just any random string) turns the
data back into the previous form.
> Encoding (levels(DATA14$R02))<-"sdšsdš"
I did another experiment. Created an empty data frame "test", changed first
column var data type to string. Proceeded like this:
> test$var<-c("ąžerty","zūcvęfčm","wįpųlkjėų")
> Encoding(test$var)
[1] "unknown" "unknown" "unknown"
> Encoding(test$var)<- "ISO8859-13"
## I took the string ISO8859-13 from the code of spss import dialog
> Encoding(test$var)
[1] "unknown" "unknown" "unknown"
## strange, the iso encoding setting does not work; I also tried
"ISO885913" and "ISO-8859-13", "CP1257", "WINDOWS-1257" - no luck
> test$var
[1] "ąžerty" "zūcvęfčm" "wįpųlkjėų"
> Encoding(test$var)<- "UTF-8"
> test$var
[1] "˙ž\xe0\xfeerty˙ž" "˙žz\xfbcv\xe6f\xe8m˙ž"
"˙žw\xe1p\xf8lkj\xeb\xf8˙ž"
> Encoding(test$var)
[1] "UTF-8" "UTF-8" "UTF-8"
# changing to UTF8 does work! also "latin1" works.
Another interesting bit is that I have absolutely no problem entering and
> showing German or French special chars in RKWard (but the R console does
> not
> allow me to enter them, either).
>
> What does
>> options("encoding")
> print?
> options("encoding")
$encoding
[1] "native.enc"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/rkward-users/attachments/20141111/062d27ff/attachment.html>
More information about the Rkward-users
mailing list