[rkward-devel] workspace browser does not list variables in amd64 arch

Thomas Friedrichsmeier thomas.friedrichsmeier at ruhr-uni-bochum.de
Mon Oct 1 19:59:33 UTC 2007


On Monday 01 October 2007, Prasenjit Kapat wrote:
> Bingo, you nailed it!

Phew! I'm glad this was it. This problem really looked hard, at first.

> It certainly fixes... Great. So, what was the problem? You seem to
> have changed the following..
> -       return ((bool) VECTOR_ELT (res, 0));
> +       return ((bool) LOGICAL (res)[0]);

Ok. First the thing about .rk.get.structure: This is the internal function 
that fetches basic information about the object-hierarchy. This used to be 
implemented in plain R code. After 0.4.7 was released, I reimplemented this 
in C. This is considerably faster, slightly more correct (well, except for 
64bit systems, until recently ;-)), and also avoids keeping all 
the "lazy-loaded" objects in memory once they have been investigated. 
Fortunately, I left the old implementation around for now, so the assign() 
from the other mail simply reinstated the old implementation. So this narrows 
it down to the new C implementation.

Now what was wrong in the C code? Well, most R objects (SEXPs, internally) are 
technically arrays of certain data for the most part. The type of data 
contained in the array may differ, however, it may contain integers, floating 
point numbers, pointers to other data, etc. Here we're dealing with an SEXP 
that should be a "logical", which internally uses integers (0 or 1) for 
storage.

Now to fetch the value of interest into C, we basically need to say: "Ok, show 
me the n-th element of the array" (in this case the first, i.e. index "0"). 
Now comes the interesting bit: In the case of integers/logicals, each element 
is 32bit long, regardless of processor architecture. However, the mistake was 
that VECTOR_ELT really asks for the first *pointer*, not for the first 
*integer* in the array. On 32bit systems this technically has the same 
effect: it fetches the first 32 bits, which is exactly the same as fetching 
the first integer. On 64bit systems, however, a pointer is 64bit long, so 
this fetches the first 64 bits, while only the first half of this contains 
meaningful data!

Much of the time, the second 32bits will be have been set to "0" anyway, and 
therefore not change anything. Considerably often, however, these bits 
contain garbage and can easily change a "false" to "true". And of course this 
is basically random.

Since the function in question was called in many places on many objects, the 
result would be highly random behavior.

If you're interested, the interesting bits from the valgrind output were 
these:

==27603== 
==27603== Thread 2:
==27603== Conditional jump or move depends on uninitialised value(s)
==27603==    at 0x5082FB: RKStructureGetter::getStructureWorker(SEXPREC*, 
QString const&, bool, RData*) (rkstructuregetter.cpp:253)
==27603==    by 0x50911C: 
RKStructureGetter::getStructureWrapper(RKStructureGetter::GetStructureWorkerArgs*) 
(rkstructuregetter.cpp:165)
==27603==    by 0x63DB32B: R_ToplevelExec (in /usr/lib/R/lib/libR.so)
==27603==    by 0x5091B8: RKStructureGetter::getStructureSafe(SEXPREC*, 
QString const&, bool, RData*) (rkstructuregetter.cpp:153)
==27603==    by 0x5093ED: RKStructureGetter::getStructure(SEXPREC*, SEXPREC*, 
SEXPREC*, SEXPREC*) (rkstructuregetter.cpp:134)
==27603==    by 0x5094B4: doGetStructure(SEXPREC*, SEXPREC*, SEXPREC*, 
SEXPREC*) (rembedinternal.cpp:669)
==27603==    by 0x640237E: (within /usr/lib/R/lib/libR.so)
==27603==    by 0x6427EAB: Rf_eval (in /usr/lib/R/lib/libR.so)
==27603==    by 0x6428FF1: (within /usr/lib/R/lib/libR.so)
==27603==    by 0x6427C7C: Rf_eval (in /usr/lib/R/lib/libR.so)
==27603==    by 0x642AF2E: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==27603==    by 0x6427B93: Rf_eval (in /usr/lib/R/lib/libR.so)

Which shows that on line 253 of rbackend/rkstructuregetter.cpp, we decide 
something based on garbage data. Actually the bug was in function 
callSimpleBool(), but apparently, this has been optimized away 
(made "inline") by GCC. Valgrind is indeed a highly useful tool (it does 
produce a number of false alarms esp. inside the Qt library for reasons I 
don't understand, but it's really great for tracking down this type of nasty 
bugs).

Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20071001/b0fdc29d/attachment.sig>


More information about the Rkward-devel mailing list