[rkward-devel] workspace browser does not list variables in amd64 arch
Thomas Friedrichsmeier
thomas.friedrichsmeier at ruhr-uni-bochum.de
Mon Oct 1 19:59:33 UTC 2007
On Monday 01 October 2007, Prasenjit Kapat wrote:
> Bingo, you nailed it!
Phew! I'm glad this was it. This problem really looked hard, at first.
> It certainly fixes... Great. So, what was the problem? You seem to
> have changed the following..
> - return ((bool) VECTOR_ELT (res, 0));
> + return ((bool) LOGICAL (res)[0]);
Ok. First the thing about .rk.get.structure: This is the internal function
that fetches basic information about the object-hierarchy. This used to be
implemented in plain R code. After 0.4.7 was released, I reimplemented this
in C. This is considerably faster, slightly more correct (well, except for
64bit systems, until recently ;-)), and also avoids keeping all
the "lazy-loaded" objects in memory once they have been investigated.
Fortunately, I left the old implementation around for now, so the assign()
from the other mail simply reinstated the old implementation. So this narrows
it down to the new C implementation.
Now what was wrong in the C code? Well, most R objects (SEXPs, internally) are
technically arrays of certain data for the most part. The type of data
contained in the array may differ, however, it may contain integers, floating
point numbers, pointers to other data, etc. Here we're dealing with an SEXP
that should be a "logical", which internally uses integers (0 or 1) for
storage.
Now to fetch the value of interest into C, we basically need to say: "Ok, show
me the n-th element of the array" (in this case the first, i.e. index "0").
Now comes the interesting bit: In the case of integers/logicals, each element
is 32bit long, regardless of processor architecture. However, the mistake was
that VECTOR_ELT really asks for the first *pointer*, not for the first
*integer* in the array. On 32bit systems this technically has the same
effect: it fetches the first 32 bits, which is exactly the same as fetching
the first integer. On 64bit systems, however, a pointer is 64bit long, so
this fetches the first 64 bits, while only the first half of this contains
meaningful data!
Much of the time, the second 32bits will be have been set to "0" anyway, and
therefore not change anything. Considerably often, however, these bits
contain garbage and can easily change a "false" to "true". And of course this
is basically random.
Since the function in question was called in many places on many objects, the
result would be highly random behavior.
If you're interested, the interesting bits from the valgrind output were
these:
==27603==
==27603== Thread 2:
==27603== Conditional jump or move depends on uninitialised value(s)
==27603== at 0x5082FB: RKStructureGetter::getStructureWorker(SEXPREC*,
QString const&, bool, RData*) (rkstructuregetter.cpp:253)
==27603== by 0x50911C:
RKStructureGetter::getStructureWrapper(RKStructureGetter::GetStructureWorkerArgs*)
(rkstructuregetter.cpp:165)
==27603== by 0x63DB32B: R_ToplevelExec (in /usr/lib/R/lib/libR.so)
==27603== by 0x5091B8: RKStructureGetter::getStructureSafe(SEXPREC*,
QString const&, bool, RData*) (rkstructuregetter.cpp:153)
==27603== by 0x5093ED: RKStructureGetter::getStructure(SEXPREC*, SEXPREC*,
SEXPREC*, SEXPREC*) (rkstructuregetter.cpp:134)
==27603== by 0x5094B4: doGetStructure(SEXPREC*, SEXPREC*, SEXPREC*,
SEXPREC*) (rembedinternal.cpp:669)
==27603== by 0x640237E: (within /usr/lib/R/lib/libR.so)
==27603== by 0x6427EAB: Rf_eval (in /usr/lib/R/lib/libR.so)
==27603== by 0x6428FF1: (within /usr/lib/R/lib/libR.so)
==27603== by 0x6427C7C: Rf_eval (in /usr/lib/R/lib/libR.so)
==27603== by 0x642AF2E: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==27603== by 0x6427B93: Rf_eval (in /usr/lib/R/lib/libR.so)
Which shows that on line 253 of rbackend/rkstructuregetter.cpp, we decide
something based on garbage data. Actually the bug was in function
callSimpleBool(), but apparently, this has been optimized away
(made "inline") by GCC. Valgrind is indeed a highly useful tool (it does
produce a number of false alarms esp. inside the Qt library for reasons I
don't understand, but it's really great for tracking down this type of nasty
bugs).
Regards
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/rkward-devel/attachments/20071001/b0fdc29d/attachment.sig>
More information about the Rkward-devel
mailing list