[rkward-devel] factor <-> character (was: three enhancemens)

meik michalke meik.michalke at uni-duesseldorf.de
Tue Feb 8 18:23:46 UTC 2011


hi,

am Dienstag 08 Februar 2011 (17:18) schrieb Thomas Friedrichsmeier:
> One corner-case is changing data from factor to character and back.
> Currently levels are preserved, and I think that really is useful. So
> labels would still be kept around, but probably labelled view (see below)
> would only be available for factors.

the crucial point would be where these unused labels are being kept in the 
meantime. does RKWard hold them in a seperate environment? as long as they're 
not part of the data.frame itself that should be safe.

but i wouldn't force users to re-use those labels if they switch back to 
factor. otherwise R and RKWard would give different results for cases like 
this one, where a label "c" for value "3" was defined but actually unused:

> some.data <- data.frame(a=factor(c(1,2,4,5), levels=c(1:5), 
labels=c("a","b","c","d","e")))
> some.data$a <- as.character(some.data$a)
> some.data$a <- as.factor(some.data$a)

R would give

> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "d" "e"

whereas, using the recycled label set, RKWard has to decide between either

> unclass(some.data$a)
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "c" "d"

(i.e., make integers like R but label them as stored), or

> unclass(some.data$a)
[1] 1 2 4 5
attr(,"levels")
[1] "a" "b" "d" "e"

(i.e., make integers according to the labels as stored, thereby reconstruct 
the original factor), both of which is a different outcome compared to R and 
might lead to hardly tracable errors, like scripts that run correctly in 
RKWard but not R.

the ability to revert to the original factor is of course a useful feature, 
too. i think RKWard should behave like R by default, that is, like it forgot 
about the previous labels, but somehow offer the option to re-use those labels 
if you really want them, stressing that this might lead to different results 
than expected. perhaps RKWard could even calculate and show the differences in 
a way (like "which(data.new.labels != data.old.labels, arr.ind=TRUE)" or 
something)...

> I wonder how many bugs can _possibly_ be left in the data editor, now?

how many lines of code does it have? ;-)


viele grüße :: m.eik

-- 
dipl. psych. meik michalke
institut f"ur experimentelle psychologie
abt. f"ur diagnostik und differentielle psychologie
heinrich-heine-universit"at 40225 d"usseldorf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20110208/c7f6afa0/attachment.sig>


More information about the Rkward-devel mailing list