[rkward-devel] factor <-> character (was: three enhancemens)
meik michalke
meik.michalke at uni-duesseldorf.de
Tue Feb 8 18:23:46 UTC 2011
hi,
am Dienstag 08 Februar 2011 (17:18) schrieb Thomas Friedrichsmeier:
> One corner-case is changing data from factor to character and back.
> Currently levels are preserved, and I think that really is useful. So
> labels would still be kept around, but probably labelled view (see below)
> would only be available for factors.
the crucial point would be where these unused labels are being kept in the
meantime. does RKWard hold them in a seperate environment? as long as they're
not part of the data.frame itself that should be safe.
but i wouldn't force users to re-use those labels if they switch back to
factor. otherwise R and RKWard would give different results for cases like
this one, where a label "c" for value "3" was defined but actually unused:
> some.data <- data.frame(a=factor(c(1,2,4,5), levels=c(1:5),
labels=c("a","b","c","d","e")))
> some.data$a <- as.character(some.data$a)
> some.data$a <- as.factor(some.data$a)
R would give
> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "d" "e"
whereas, using the recycled label set, RKWard has to decide between either
> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "c" "d"
(i.e., make integers like R but label them as stored), or
> unclass(some.data$a)
[1] 1 2 4 5
attr(,"levels")
[1] "a" "b" "d" "e"
(i.e., make integers according to the labels as stored, thereby reconstruct
the original factor), both of which is a different outcome compared to R and
might lead to hardly tracable errors, like scripts that run correctly in
RKWard but not R.
the ability to revert to the original factor is of course a useful feature,
too. i think RKWard should behave like R by default, that is, like it forgot
about the previous labels, but somehow offer the option to re-use those labels
if you really want them, stressing that this might lead to different results
than expected. perhaps RKWard could even calculate and show the differences in
a way (like "which(data.new.labels != data.old.labels, arr.ind=TRUE)" or
something)...
> I wonder how many bugs can _possibly_ be left in the data editor, now?
how many lines of code does it have? ;-)
viele grüße :: m.eik
--
dipl. psych. meik michalke
institut f"ur experimentelle psychologie
abt. f"ur diagnostik und differentielle psychologie
heinrich-heine-universit"at 40225 d"usseldorf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20110208/c7f6afa0/attachment.sig>
More information about the Rkward-devel
mailing list