Tile data management changed in Krita

Fazekas László mneko at freemail.hu
Tue Mar 22 07:41:34 UTC 2016


Hello Dmitry,

The part I have problem here is the deletion of the tile inside the 
deref() call. See the following case:

1. There is a tile with a single owner, the owner thread protected it 
with a ref() call.
The reference counter is 1.

2. A second thread gets a pointer to the tile to use it. Right after 
that, the task scheduler pauses this thread.

3. The first thread decides to release the tile. Now it calls the 
deref(), the reference counter drops to 0 and the thread deletes the tile.

4. The second thread starts running again and tries to ref() the already 
deleted tile.

I agree this is very unlikely but I think it is theoretically possible. 
If there are thousands of ref() and deref() calls per second it can 
happen.  The tile data object is on the heap so perhaps it is accessible 
after its destruction without a segfault. So even if this happens, 
perhaps it remains hidden. The null data pointer means the tile was 
swapped out so perhaps the program can't notice if it's using a deleted 
tile. I don't really know what happens on the heap if you delete an 
object twice.

My solution was the using of shared pointers. But perhaps you're right 
and it has too much overhead and this is not a real bug because other 
mechanisms in the program protects it to happen.

To be sure about it, I think the following solution may helps: let the 
initial value of the reference counter be 1 (I think it already becomes 
one right after its creation). Then if the ref() call encounters a zero 
reference counter it means the tile was deleted so it can drop an error 
message. It is not a real overhead since it only checks the reply value 
of the Qt ref() call. Then we can see if this bug happens or not.

Fazek



2016-03-22 07:54 keltezéssel, Dmitry Kazakov írta:
> Hi, Fazekas!
>
>
> I know it's dangerous. In my opinion Krita is still very unstable so I can
>> totally imagine this bug happens sometimes. There are just too many crashes.
>>
>
>
>> For the ref() and deref() calls, I still think it's not a safe way to
>> allocate and delete things. Because there is an unprotected time gap
>> between accessing and protecting the tile. You can build extra protection
>> around this, but multi threading is very tricky sometimes. I know these
>> calls must be highly regulated by the lockings, but I'm also sure the
>> deref() call is not protected - it cant be - by the lock I tried to avoid
>> in my second commit.
>>
> Well, this code is here for almost five years now and there was not a
> single backtrace pointing to this place. Could you please provide *at least
> theoretical* sequence of multithreaded calls that could cause a problem
> here? Most of the acquire() and ref() calls happen when at least one object
> (mostly (*this) of a KisTile) still owns at least one ref+userCount to the
> KisTileData. Or (*this) of KisMementoItem holds the ref counter. I know
> that this is more a C-way of coding, but take care, that code is called
> million times per second when working with 10k images. So calling ref(),
> deref() on every function call is not a go. Memory barrier is not a cheap
> operation, so we should avoid it if possible.
>
> I made some tests to see how the different threads are calling these
>> functions and it happens very often in an asynchronous way.
>>
> Yes, this part is called from multiple threads all the time.
>
>
>
>> Perhaps I can make a test to force this bug for you, but it's not easy,
>> since it would happen only if the threads are synchronized exactly right.
>> And my brain hurts if I'm trying to imagine what happening in the different
>> threads simultanously. So for now, I used the fact that a solution without
>> time gaps is surely better.
>>
> I cannot understand what "gaps" you mean. Please elaborate a bit, then we
> will be able to find a solution if needed. Doing a test might be a bit
> difficult, but if you decide to do it, you can use
> KisTiledDataManagerTest::stressTest() for a base.
>
>
>
>> I noticed the test directory, but I don't know how to include the
>> unittests for building so I left the test part untouched. But I'd like to
>> make the test codes too, even for my csv plugin so how can I build the
>> unittests actually?
>>
> You should pass -DBUILD_TESTING=ON to a cmake command.
>
>
>
>
> _______________________________________________
> Krita mailing list
> kimageshop at kde.org
> https://mail.kde.org/mailman/listinfo/kimageshop

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kimageshop/attachments/20160322/a08f206b/attachment.html>


More information about the kimageshop mailing list