probable grave bug: SPSS data with umlauts in factor levels can lead to data loss

meik michalke meik.michalke at uni-duesseldorf.de
Tue Jun 28 20:13:32 UTC 2016


Am Montag, 27. Juni 2016, 10:29:27 schrieb Thomas Friedrichsmeier:
> how I hate those pesky encoding bugs...

und ich erst...

> Questions to narrow down this one (or help me reproduce it):

funny thing, it's not so easy to replicate today. i imported the same SPSS 
data on my own machine and didn't have problems (with locales set to  
de_DE.UTF-8). i then imported it on the other machine where it originally 
happened all evening, and it still didn't replicate.

only when i imported the data in RKWard started with LC_ALL=C, the imported 
file behaved strangely, but "männlich" wasn't even correctly shown (only 
"mnnlich") -- but it did also introduce <NA>s in the actual data while showing 
"mnnlich" in the data editor.

> - Is this _only_ for SPSS imported data, or same problem when entering
>   umlauts manually?

LC_ALL=C ; rkward:
using manually added factor levels with umlauts shows correctly in the data 
editor, but removes the umlauts when the data.frame is printed in the console. 
it doesn't replace them wit <NA>s, though.

> - What if in the imported data you edit "männlich" to "Männlich" (or
>   anything else still containing an umlaut) does that work ok?

it didn't that evening, it does today.

> - What does str(my.data) print after the import? After editing?
> - What does Encoding(levels(my.data$something)) print after the import?
>   After editing?

i must find out how to reliably reproduce the problem first.

in the original data set, there was also many data with undefined levels. no 
idea if that triggers something.

i still have a copy of the faulty dataset, though.


viele grüße :: m.eik

-- 
  dipl. psych. meik michalke
  institut f"ur experimentelle psychologie
  abt. f"ur diagnostik und differentielle psychologie
  heinrich-heine-universit"at d-40204 d"usseldorf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20160628/a59928a2/attachment.sig>


More information about the rkward-devel mailing list