[rkward-devel] a "misfeature"
Deepayan Sarkar
deepayan.sarkar at gmail.com
Thu Apr 5 20:03:26 UTC 2007
On 4/5/07, Thomas Friedrichsmeier
<thomas.friedrichsmeier at ruhr-uni-bochum.de> wrote:
> On Thursday 05 April 2007 20:41, you wrote:
> > No, GO has a few large objects.
>
> Well, yes, it only has 24 top level objects (which are environments), but
> inside those, there are hundreds of thousands of small objects.
>
> > And rk.get.structure is not likely to
> >
> > be the main problem. I get (in plain R run from a shell):
> > > library(GO)
> > > length(ls("package:GO"))
> >
> > [1] 24
> >
> > > system.time(sapply(ls("package:GO"), exists))
> >
> > user system elapsed
> > 44.311 1.848 48.466
>
> At least on my system, doing this leads to heavy swapping (a memory problem),
> and most of the time comes from there. Of course memory *is* a problem in my
> current approach, but we could in fact find a solution that requires "only"
> loading the data, but not keeping it in memory (I looked some, and this seems
> doable at the C level).
>
> > > system.time(sapply(ls("package:GO"), exists))
> >
> > user system elapsed
> > 0.004 0.000 0.003
> >
> > The second time around time is much faster. I'm pretty sure descending
> > into environments inside rk.get.structure has negligible overhead
> > compared to the initial load times.
>
> No time to do serious timing right now, but I think it does contribute
> considerably. A simple example:
>
> library(GO)
> # let's use just one of the environments in GO, for now:
> system.time(sapply(ls(GOTERM), exists))
> # [1] 2.408 0.040 2.476 0.000 0.000
> system.time(sapply(ls(GOTERM), exists))
> # [1] 0.356 0.000 0.355 0.000 0.000
>
> # now, after the data is already loaded:
> library (rkward)
> system.time (.rk.get.structure (GOTERM, "GOTERM"))
> # [1] 23.861 0.136 24.720 0.000 0.000
> # this step does not get any better on repetition.
Ah, so lazy loading is bad enough and rk.get.structure is even worse :-).
> Which ever way to go, I think this is something that really could and should
> be optimized, as well (it also becomes a problem for complex nested lists,
> such as produced by the XML package when parsing large XML files).
>
> > I think the solution is to build up a database beforehand. Objects in
> > .GlobalEnv are usually not a problem (they are already loaded) and the
> > current approach should be fine. Package namespaces are typically
> > sealed, and it should be hard for the user to modify things inside (if
> > they do, they deserve whatever they get). So, given a specific version
> > of a package, one should only need to compute the relevant information
> > once. I think this is a general enough problem that a common solution
> > that other front-ends (e.g. rcompgen) can use would be helpful. This
> > will need some consensus on what information such a database should
> > contain. Clearly, rk.get.structure (which I'm not familiar with) can
> > be the basis for a starting point.
> >
> > Ideally, this database should be computed by R CMD build (like the
> > INDEX file) and distributed as part of the package. This is not going
> > to happen anytime soon, but one good way to move forward in that
> > direction would be to write a separate package (not tied to rkward)
> > that would create such a file (I would recommend a plain text format
> > that read.table can read, rather than fancy XML-type things) given a
> > package. The codetools package may be helpful here (or not, I don't
> > really know).
> >
> > Once this is done, there has to be a decision on when to compute and
> > where to store that information. That's a topic for later.
>
> This is an interesting suggestion, indeed, and may well be the way to go.
> Well, I probably won't have the time to read and respond to E-Mail the next
> few days (and in fact, I'll be off in a few minutes), but maybe we can take
> up this discussion again, next week. I'd be glad to have your input on this.
Sure. It would be interesting to see how the Mac GUI behaves (since it
also shows the argument list in functions). I'll check when I get a
chance.
-Deepayan
More information about the Rkward-devel
mailing list