[rkward-devel] a "misfeature"
Thomas Friedrichsmeier
thomas.friedrichsmeier at ruhr-uni-bochum.de
Thu Apr 5 15:17:52 UTC 2007
Hi again,
On Thursday 05 April 2007 00:35, Prasenjit Kapat wrote:
> A friend of mine (Deepayan: Lattice author) comments the following:
>
> [Quote]
> Basically, as far as I can tell, whenever a new package is loaded
> .rk.get.structure is run on all objects in the package (or at least in
> the namespace). This means that all these objects are evaluated,
> including all lazy-loaded symbols, which defeats the whole point of
> lazy loading. This is not much of an issue for small packages, but try
>
> source("http://www.bioconductor.org/biocLite.R")
> biocLite("GO")
> library(GO)
>
> [/Quote]
after a bit more investigation, the matter turns out to be yet more complex:
1) It is possible to determine whether a symbol is really a promise at least
from C.
1b) Unfortunately, however, for example in the base package, almost
*everything* is a promise. That is, not just large datasets, but also the
majority of functions.
1c) I don't think there is any way, currently, to tell apart promises for
functions and promises for data. Or of course - as would be optimal - to tell
apart promises for "small" objects from promises for "large" ones. Once we
try to get *any* information about the object, the promise is evaluated, i.e.
the object is loaded. So we're back to square one on this front.
2) In the example of the GO package, the problem is multiplied by the fact
that there are literally hundreds of thousands of (small) objects. As far as
I can see, loading all the data - while somewhat crazy - is not the main
slowdown. Lazy loading is pretty fast, and mainly uses memory, not CPU
cycles. Rather the problem is evaluating .rk.get.structure() on each single
one of those.
2b) .rk.get.structure() could probably be sped up considerably by implementing
it in C, instead of R. Likely this could save considerable amounts of
(temporary) memory as well, but this claim is entirely untested.
2c) Whatever the optimization, as the end result, rkward will build an
internal representation of the "structure" of each of the objects (i.e. name,
type of data, child objects, etc.). This results in a small memory overhead
per object. However, in the case of thousands of small objects, the overhead
may be noticable.
So what to do? Getting at least basic structure information about all objects
is needed for the object browser to be useful. Also, we use this info for
object name completion and function argument hinting (I see that package
rcompgen provides similar functionality, but looks up potential completions
dynamically. While in theory such an approach could be used in RKWard as
well, it would not be easy to fit it into the threaded approach we use (which
allows to edit a script with object name completion while simultationsly
other calculations are running)). In the future it might additionally be used
to aid in syntax highlighting. So I think overall it's not something we can
just rip out.
Any way to alleviate the problem? First is to implement .rk.get.structure() in
C. I'll try to see, what I can do for 0.4.8, here. Second might be a
heuristic to determine when it's best, not to attempt to fetch the structure.
Unfortunately, I have no good idea on this.In the case of the GO library,
simply excluding recursion into environments would make most of the problem
go away, but like this probably does not generalize well. Third might be a
way to let the user control, whether and for which libraries structure
information is fetched. But I guess, this would be a power-user option that
is not easily discoverable.
Well, any further insight is appreciated.
Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20070405/f6dc90cb/attachment.sig>
More information about the Rkward-devel
mailing list