[rkward-devel] factor <-> character (was: three enhancemens)

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Wed Feb 9 11:15:54 UTC 2011


Hi,

On Tuesday 08 February 2011, meik michalke wrote:
> the crucial point would be where these unused labels are being kept in the
> meantime. does RKWard hold them in a seperate environment? as long as
> they're not part of the data.frame itself that should be safe.

they are stored as an attribute("levels") of the corresponding column. This 
happens to match the internal representation of factors in R.
 
> but i wouldn't force users to re-use those labels if they switch back to
> factor. otherwise R and RKWard would give different results for cases like
> 
> this one, where a label "c" for value "3" was defined but actually unused:
> > some.data <- data.frame(a=factor(c(1,2,4,5), levels=c(1:5),

Well, keep in mind that this is all just about the behavior of the data 
editor. The data editor will re-use the "old" levels, when you change the type 
from factor to something else, and then back to factor, _in the data editor_.

> some.data <- data.frame (a=factor (c ("a", "b", "d", "e"), levels=c ("a", 
"b", "c", "d", "e")))
> rk.edit (some.data)
[1] a b d e
Levels: a b c d e

- In the data editor:
	- Change type to "String"
	- Change type back to "Factor"

> some.data$a
[1] a b d e
attr(,".rk.invalid.fields")
list()
Levels: a b c d e
> unclass (some.data$a)
[1] 1 2 4 5
attr(,"levels")
[1] "a" "b" "c" "d" "e"
attr(,".rk.invalid.fields")
list()

(Ok, the "invalid fields" could certainly be pruned, too, but the point is that 
the levels and numeric values are as before.)

Conversions done in R code are not affected:

> as.character (some.data$a)
[1] "a" "b" "d" "e"
> as.factor (as.character (some.data$a))
[1] a b d e
Levels: a b d e
> unclass (as.factor (as.character (some.data$a)))
[1] 1 2 3 4
attr(,"levels")
[1] "a" "b" "d" "e"

> (i.e., make integers according to the labels as stored, thereby reconstruct
> the original factor), both of which is a different outcome compared to R
> and might lead to hardly tracable errors, like scripts that run correctly
> in RKWard but not R.

Well, there might be potential for confusion in this, but this should _not_ 
affect R scripts in any way.

> > I wonder how many bugs can _possibly_ be left in the data editor, now?
> 
> how many lines of code does it have? ;-)

Too many... In fact, I just fixed another set of bugs, which would even lead to 
crashes in:

> some.data <- data.frame (a=c(1:4))
> rk.edit (some.data)
> some.data <- data.frame (a=c(1:10))

But you may even want to give that a try before installing the next daily 
build: It will give you a chance to see the new "crash recovery" dialog.

Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20110209/be9fe7bf/attachment.sig>


More information about the Rkward-devel mailing list