probable grave bug: SPSS data with umlauts in factor levels can lead to data loss
meik michalke
meik.michalke at uni-duesseldorf.de
Tue Jun 28 20:13:32 UTC 2016
Am Montag, 27. Juni 2016, 10:29:27 schrieb Thomas Friedrichsmeier:
> how I hate those pesky encoding bugs...
und ich erst...
> Questions to narrow down this one (or help me reproduce it):
funny thing, it's not so easy to replicate today. i imported the same SPSS
data on my own machine and didn't have problems (with locales set to
de_DE.UTF-8). i then imported it on the other machine where it originally
happened all evening, and it still didn't replicate.
only when i imported the data in RKWard started with LC_ALL=C, the imported
file behaved strangely, but "männlich" wasn't even correctly shown (only
"mnnlich") -- but it did also introduce <NA>s in the actual data while showing
"mnnlich" in the data editor.
> - Is this _only_ for SPSS imported data, or same problem when entering
> umlauts manually?
LC_ALL=C ; rkward:
using manually added factor levels with umlauts shows correctly in the data
editor, but removes the umlauts when the data.frame is printed in the console.
it doesn't replace them wit <NA>s, though.
> - What if in the imported data you edit "männlich" to "Männlich" (or
> anything else still containing an umlaut) does that work ok?
it didn't that evening, it does today.
> - What does str(my.data) print after the import? After editing?
> - What does Encoding(levels(my.data$something)) print after the import?
> After editing?
i must find out how to reliably reproduce the problem first.
in the original data set, there was also many data with undefined levels. no
idea if that triggers something.
i still have a copy of the faulty dataset, though.
viele grüße :: m.eik
--
dipl. psych. meik michalke
institut f"ur experimentelle psychologie
abt. f"ur diagnostik und differentielle psychologie
heinrich-heine-universit"at d-40204 d"usseldorf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20160628/a59928a2/attachment.sig>
More information about the rkward-devel
mailing list