probable grave bug: SPSS data with umlauts in factor levels can lead to data loss

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Mon Jul 4 10:35:00 UTC 2016


Hi,

On Tue, 28 Jun 2016 22:13:32 +0200
meik michalke <meik.michalke at uni-duesseldorf.de> wrote:
> Am Montag, 27. Juni 2016, 10:29:27 schrieb Thomas Friedrichsmeier:
> > how I hate those pesky encoding bugs...

ok, so I _believe_ the essence of what is/was happening was RKWard was
trying to edit data using commands like
  var[1] <- "männlich"
While converting that command to R's encoding (where "ä" is not
representable in this case), that became
  var[1] <- "mnnlich"
which is not a valid factor level, resulting in NAs.

That should hopefully be fixed in master, now, simply by using a
conversion that will preserve non-representable characters as unicode
numbers. I hope this does not introduce any side-effects in other
obscure locale setups, though. Also, the corresponding documentation
is a bit thin, as it seems...

My testing so far seems to show expected results, but please test git
master, esp. if you are running an "exotic" locale.

Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20160704/c0f449ee/attachment.sig>


More information about the rkward-devel mailing list