probable grave bug: SPSS data with umlauts in factor levels can lead to data loss

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Mon Jun 27 08:29:27 UTC 2016


Hi,

how I hate those pesky encoding bugs... Sorry I'm still not back up to
speed. Glad you seem to have solved the Mac issue.

Questions to narrow down this one (or help me reproduce it):
- Under what locale does this happen (Sys.getlocale())?
- Is this _only_ for SPSS imported data, or same problem when entering
  umlauts manually?
- What if in the imported data you edit "männlich" to "Männlich" (or
  anything else still containing an umlaut) does that work ok?
- What does str(my.data) print after the import? After editing?
- What does Encoding(levels(my.data$something)) print after the import?
  After editing?

Regards
Thomas

On Sun, 26 Jun 2016 23:40:29 +0200
meik michalke <meik.michalke at uni-duesseldorf.de> wrote:
> i've run into a weird problem that turned out to be a really
> dangerous bug: we've imported a SPSS .sav file that had some
> variables with predefined factor levels, which were transformed into
> R factors. entering data was possible without problems, no warnings
> were raised.
> 
> we saved the data to a .Rdata file (no warnings or errors), but when
> we re- imported it later on, all cells with a factor level that had
> an umlaut in its name (e.g., "männlich") were just <NA>, both in the
> editor window as well as in the R console. that is, the data was
> completely gone and had to be re- entered. this was reproduceable.
> 
> i renamed a level from "männlich" into "maennlich" and only then the
> data was correctly saved.
> 
> 
> viele grüße :: m.eik
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20160627/873664e7/attachment.sig>


More information about the rkward-devel mailing list