[rkward-devel] Introduction, proposing a HC plugin and may be a bug report

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Nov 25 11:05:43 UTC 2010


Hi,

welcome to the list, and thanks for your offer to help. Let's see if we can get 
you going:

On Thursday 25 November 2010, Jose Maria Polo wrote:
> The first one that I would like to propose is hierarchical clustering (HC).
> 
> A simple HC, basically a front end for hclust (transposing the data (d)
> and generating a distance matrix (dist) will be necessary first).
> 
>  > t.r = t(data)
>  > dist.tr = dist(t.r)
>  > hc.tr = hclust(dist.tr, method = "average")
>  > plot(hc.tr, hang = -1)
> 
> Another approach could be to call pvclust, which it calculates something
> similar. However it also gives p values.
> 
>  > result <- pvclust(data, method.dist="cor", method.hclust="average",
> 
> nboot=1000)
> 
>  > plot(result)
> 
> What do you think?

Frankly, I have no idea which of these approaches is better. Unless pvclust 
(which package is that?) offers significant advantages compared to the approach 
usin hclust, I'd lean towards using hclust. Since that is in the official 
"stats" package, this will allow to use the plugin without installing an add-
on package. Also it's a very safe bet that the functions in the "stats" 
package will continue to be supported and stable in the long term.

> Anyway, I am not writting to just make a suggestion. I would like to
> help as much as I can with this, even if I never have program anything.

Great!

> I already started reading the information about how to generate a plugin.
> As suggested in your plugin webpage, I guess that the best way will be
> to start modifying a plugin that has a similar role.
> Ideally should be a pluggin that allows to compare as many variables as
> we want (I was thinking to use the boxplot plugin, but I realized that
> the boxplot plugin use many variables, but all separated).
> Any idea or suggestions?

Well, the boxplot plugin may not be the easiest one to start with, as it's a 
comparatively complex one. Perhaps barplot or dotchart are easier to handle 
for the start.

In order to support multiple variables, you probably want to generate R code 
like this, as a very first step:
    data <- data.frame (var_a, var_b, var_c, ...)
This is to collect all selected variables into a single data.frame.
Achieving this is not all that hard. First make sure, the <varslot> in your 
.xml-file has the attribute multi="true", to allow selection of serveral 
variables. Then, in the .js-file, use:
    var xvarsstring = getValue ("x").split ("\n").join (", ");
    echo ('data <- data.frame (' + xvarstring + ')\n');

> 2) I am not sure if I should report this here, but I think that I found
> a bug, when I try to use basic statistic the "submit" botton does not
> turn on.

Thanks. This has been reported before, but I could never reproduce this. Now I 
found out it happens when you have an object called "my.data" in the 
workspace. Fixed in our SVN repository.

Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20101125/205c776c/attachment.sig>


More information about the Rkward-devel mailing list