[rkward-tracker] [ rkward-Bugs-2866476 ] FIXED: warning messages and missings when using mclapply()

SourceForge.net noreply at sourceforge.net
Mon Oct 24 08:14:12 UTC 2011


Bugs item #2866476, was opened at 2009-09-25 13:02
Message generated for change (Comment added) made by tfry
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=459007&aid=2866476&group_id=50231

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Thomas Friedrichsmeier (tfry)
>Summary: FIXED: warning messages and missings when using mclapply()

Initial Comment:
There appears to be a memory leak in rkward when using the function mclapply() in package multicore.  This eventually crashes rkward and the session.  The mclapply() function does not leak memory outside of rkward.   mclapply is a version of lapply for  running parallel computations on multiple CPU cores
Ubuntu 9.04 'Jaunty'
rkward version 0.5.0d
KDE version 4.2.2
R version 2.9.2
multicore 0.1-3


----------------------------------------------------------------------

>Comment By: Thomas Friedrichsmeier (tfry)
Date: 2011-10-24 10:14

Message:
Fixed in version 0.5.7.

----------------------------------------------------------------------

Comment By: Thomas Friedrichsmeier (tfry)
Date: 2011-10-04 14:00

Message:
For future reference, here's a console transcript with debug statements
turned on. No idea, what to make of this, so far, though.

-------------------

> options (warn=2)
+ library (parallel)
+ for (i in 1:1000) {
+ x <- mclapply (1:100, rnorm)
+ y <- sapply (x, length)
+ stopifnot (all.equal (1:100, y))
+ }
parent registers new child 32076
child process 32076 started
parent registers new child 32077
select_children: maxfd=14, wlen=2, wcount=2, zombies=0, timeout=1:0
child process 32077 started
child 32076: send_master (20422 bytes)
  sr = 1
 - read select 1 children: 32076 
 read_child_ci(32076) - read length returned 4
 read_child_ci(32076) - read 20422 at 0 returned 20422
select_children: maxfd=14, wlen=2, wcount=2, zombies=0, timeout=1:0
child 32076: mcexit called
  sr = 1
 - read select 1 children: 32076 
 read_child_ci(32076) - read length returned 4
removing child 32076
select_children: maxfd=14, wlen=2, wcount=1, zombies=0, timeout=1:0
child process 32076 got SIGUSR1; child_exit_status=-1
child 32076: exiting
  sr = -1
child 32077: send_master (20822 bytes)
child 32077: mcexit called
child 32077 is waiting for permission to exit
Error in selectChildren(ac, 1) : 
  (converted from warning) error 'Interrupted system call' in select
Calls: mclapply -> selectChildren
Error in selectChildren(ac, 1) : 
  (converted from warning) error 'Interrupted system call' in select
Calls: mclapply -> selectChildren
select_children: maxfd=14, wlen=1, wcount=1, zombies=0, timeout=0:0
  sr = 1
 - read select 1 children: 32077 
 read_child_ci(32077) - read length returned 4
 read_child_ci(32077) - read 20822 at 0 returned 20822
select_children: maxfd=14, wlen=1, wcount=1, zombies=0, timeout=0:500000
  sr = 1
 - read select 1 children: 32077 
 read_child_ci(32077) - read length returned 4
removing child 32077

----------------------------------------------------------------------

Comment By: Thomas Friedrichsmeier (tfry)
Date: 2010-11-19 20:20

Message:
In the current development version, frontend in backend run in separate
processes. This improves the situation a bit, but does not entirely fix it.
There are no more crashes / hangs, but some warnings "select: Interrupted
during system call" start to appear.

Occasionally, child processes appear to fail (run the following a number
of times):

library (multicore)
for (iteration in 1:100) {
print (iteration)
x <- mclapply (1:100, rnorm, mc.cores=2)
y <- sapply (x, length)
stopifnot (y == 1:100)
}

So far I could not figure out, just which signal interrupts a select
(probably in the transmitter thread), and how to block it from doing so.


----------------------------------------------------------------------

Comment By: Thomas Friedrichsmeier (tfry)
Date: 2009-09-28 13:10

Message:
Ok, so we see slightly different symptoms. This may be due to hardware
differences, or to changes in RKWard since 0.5.0d.

I am fairly certain, that the cause really is the forking, though. More
precisely, what appears to happen is that mclapply forks the thread that R
is running in. In the case of RKWard, the R thread assumes to be able to
talk to the main thread of RKWard. That simply can't work.

Unfortunately, this means that this is not easy to fix. For a proper fix
we'd need code to detect the special situation of having been forked off,
and more code to deal with that, properly. I'd say that is probably more
trouble than it's worth to cover this special case.

This does not mean the bug is never going to be fixed. In the somewhat
longer term I plan to separate the "main" and "R" threads into two
different processes, and the R process in that setup would be much more
similar to a vanilla R instance in several ways. This will hopefully fix a
number of other bugs, and this one should be much easier to resolve, then,
as well. But again, this will take some time. Until then, you can set
options("mc.cores"=1) to at least be able to develop the code in RKWard. To
actually run it only multiple threads, use a plain R session, for the time
being.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2009-09-27 04:44

Message:
Hi,  Thanks for the feedback.   The only evidence I have for a memory leak
was through my system monitor (conky).  I have a intel dual core machine. 
I ran your script both in rkward and in R in a terminal window.  There was
no problem running the script in a terminal window.  Curiously when I ran
your script in rkward gcinfo() did not report growing memory but rkward
crashed after about 30 odd iterations.  However monitoring conky at the
same time showed rkward.bin pegged at 100% in both cpu's and the ram usage
climbing rapidly and reaching 100% before rkward crashed.

I assume the problem is caused by forking rkward.bin.  I really don't have
much more insight to offer unfortunately. 

----------------------------------------------------------------------

Comment By: Thomas Friedrichsmeier (tfry)
Date: 2009-09-25 16:45

Message:
Thanks for reporting. I hope I can still reach you, as it would be nice, if
you could provide some more info / do some more testing.

I was not able to reproduce this exactly using the current SVN version
(but read on, below). Can you provide a reproducible example script to
trigger the crash?

You say this is a memory leak. Do you have any specific evidence for
that?

Read on below, only if you're curious.

-----

** What I can reproduce, and some thoughts on that**

The following script will hang after a few iterations in RKWard (not R).
(On a single core machine). gcinfo() does not show growing memory
consumption, but simply nothing happens after a while, with the main
rkward.bin process hogging up most of the CPU.

library (multicore)

gcinfo(TRUE)
print (gc ())

for (iteration in 1:100) {
        print (iteration)
        mclapply (1:1000, rnorm, mc.cores=2)
}

print (gc ())
gcinfo (FALSE)


One symptom around this is that a number of not-quite-dead processes is
spilt. Those are mostly sleeping, however. The backtrace inside those
processes is:

#0  0xb8070430 in __kernel_vsyscall ()                                    
                                                                           
 
#1  0xb5b58292 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib/i686/cmov/libpthread.so.0                                             
              
#2  0xb5c588b4 in pthread_cond_timedwait () from /lib/i686/cmov/libc.so.6 
                                                                           
 
#3  0xb69caf8e in thread_sleep (ti=0x9a1e7d0) at
thread/qthread_unix.cpp:297                                                
                           
#4  0xb69cb0bb in QThread::msleep (msecs=<value optimized out>) at
thread/qthread_unix.cpp:323                                                
         
#5  0x0813abe9 in RThread::handleStandardCallback (this=0x99da3a0,
args=0x9a1e8f4) at
/home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:487      
#6  0x08148b92 in RReadConsole (prompt=0xb71aef7a "Selection: ",
buf=0xb720d200 "", buflen=4096, hist=0)                                    
           
    at /home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:202
                                                                           
 
#7  0xb7141f05 in R_ReadConsole () from /usr/lib/R/lib/libR.so            
                                                                           
 
#8  0xb7071a1c in ?? () from /usr/lib/R/lib/libR.so                       
                                                                           
 
#9  <signal handler called>                                               
                                                                           
 
#10 0xb6bf2046 in QDBusAdaptorConnector::relaySlot (this=0xa8ba170,
argv=0xb3c5fefc) at qdbusabstractadaptor.cpp:268                           
        
#11 0xb6bf29d8 in QDBusAdaptorConnector::qt_metacall (this=0xa8ba170,
_c=QMetaObject::InvokeMetaMethod, _id=0, _a=0xb3c5fefc)                    
      
    at qdbusabstractadaptor.cpp:364                                       
                                                                           
 
#12 0xb6ad1b33 in QMetaObject::activate (sender=0xabb4580,
from_signal_index=0, to_signal_index=1, argv=0xb3c5fefc) at
kernel/qobject.cpp:3112          
#13 0xb6ad1f60 in QMetaObject::activate (sender=0xabb4580, m=0x8182540,
from_local_signal_index=0, to_local_signal_index=1, argv=0xb3c5fefc)       
    
    at kernel/qobject.cpp:3206                                            
                                                                           
 
#14 0xb6ad1feb in QObject::destroyed (this=0xabb4580, _t1=0xabb4580) at
.moc/release-shared/moc_qobject.cpp:143                                    
    
#15 0xb6ad2df9 in ~QObject (this=0xabb4580, __in_chrg=<value optimized
out>) at kernel/qobject.cpp:757                                            
     
#16 0xb6d9de3d in ~Scheduler (this=0xabb4580, __in_chrg=<value optimized
out>) at ../../kio/kio/scheduler.cpp:259                                   
   
#17 0xb6da1131 in ~SchedulerPrivate () at ../../kio/kio/scheduler.cpp:102 
                                                                           
 
#18 destroy () at ../../kio/kio/scheduler.cpp:209                         
                                                                           
 
#19 0xb6cd68db in ~KCleanUpGlobalStatic (this=0xb6e9f1f4, __in_chrg=<value
optimized out>) at ../../kdecore/kernel/kglobal.h:62                       
 
#20 0xb5b98589 in exit () from /lib/i686/cmov/libc.so.6                   
                                                                           
 
#21 0xb28d9e9e in mc_exit (sRes=0xa190538) at fork.c:492                  
                                                                           
 
#22 0xb701c689 in ?? () from /usr/lib/R/lib/libR.so                       
                                                                           
 
#23 0xb7045b6a in Rf_eval () from /usr/lib/R/lib/libR.so                  
                                                                           
 
[...]
#48 0xb7045895 in Rf_eval () from /usr/lib/R/lib/libR.so                  
                                                                           
 
#49 0xb7048628 in ?? () from /usr/lib/R/lib/libR.so
#50 0xb7045895 in Rf_eval () from /usr/lib/R/lib/libR.so
#51 0xb7072b33 in R_ReplDLLdo1 () from /usr/lib/R/lib/libR.so
#52 0x08142aed in runUserCommandInternal () at
/home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:950
#53 0xb6ff2f19 in R_ToplevelExec () from /usr/lib/R/lib/libR.so
#54 0x0814406a in REmbedInternal::runCommandInternal (this=0x99da3a8,
command_qstring=..., Rf_error=0xb3c622d8, print_result=true)
    at
/home/thomas/develop/rkward4/rkward/rbackend/rembedinternal.cpp:1007
#55 0x0813c616 in RThread::doCommand (this=0x99da3a0, command=0xaad53c8)
at /home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:189
#56 0x0813d755 in RThread::run (this=0x99da3a0) at
/home/thomas/develop/rkward4/rkward/rbackend/rthread.cpp:125
#57 0xb69cb582 in QThreadPrivate::start (arg=0x99da3a0) at
thread/qthread_unix.cpp:188
#58 0xb5b544b5 in start_thread () from /lib/i686/cmov/libpthread.so.0
#59 0xb5c49a5e in clone () from /lib/i686/cmov/libc.so.6

So what we appear to have here, is that the main thread has already exited
in mc_exit(), and the R thread is somehow left waiting (for response from
the main thread; likely it got a SIGSEGV, though, and is asking for info on
that). I'm not sure, whether these zombie process are really the problem,
though, or just a symptom.

Are Qt / GUI programs expected to be fork()able at all?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=459007&aid=2866476&group_id=50231




More information about the rkward-tracker mailing list