Tile data management changed in Krita

Fazekas László mneko at freemail.hu
Tue Mar 22 19:15:21 UTC 2016


I made a testbed to check these and really hardly trying to crash Krita :)
So far I have to say you're right, it not happens. At least I found some 
other bugs to solve.

Fazek

2016-03-22 09:31 keltezéssel, Dmitry Kazakov írta:
> On Tue, Mar 22, 2016 at 10:41 AM, Fazekas László <mneko at freemail.hu> wrote:
>
>> Hello Dmitry,
>>
>> The part I have problem here is the deletion of the tile inside the
>> deref() call. See the following case:
>>
>> 1. There is a tile with a single owner, the owner thread protected it with
>> a ref() call.
>> The reference counter is 1.
>>
>> 2. A second thread gets a pointer to the tile to use it. Right after that,
>> the task scheduler pauses this thread.
>>
> Here we need to consider what "single owned" and "other thread" are.
>
> Single owners that release the tile data can be:
>
> A1) KisTile::lockForWrite()
> A2) ~KisTile
> A3) ~KisMementoItem()
>
> "Other thread" can be:
>
> B1) KisTile::lockForWrite()
> B2) KisTile::lockForRead()
> B3) KisTile::init()
> B4) KisTileDataPooler
> B5) KisTileDataSwapper
> B6) KisMementoItem::tile()
>
> Cases B4 and B5 are not affected, because they hold a KisTileDataStore
> lock. No tile data can be deleted with that.
> Case B6 is also not relevant, because m_tileData is already owned by the
> memento item.
>
> Combination of A1 and (B1 or B2) is guarded by
> KisTile::safeReleaseOldTileData(). No tile data will be released before
> *all* the locks are unlocked.
>
> B3 is usually called either within a source tile lock held or for a default
> tile data, which is stored in KisTiledDataManager
>
> Combining A2 and A3 with anything from the second list --- I cannot imagine
> how it is possible. All tile read/write accesses should be done with a lock
> held, so the destructor of a tile can not be called while something else is
> still accessing its data.
>
>
>
>> I agree this is very unlikely but I think it is theoretically possible.
>>
> As far as I remember the higher level structures guard us from this scenario
>
>
>> If there are thousands of ref() and deref() calls per second it can
>> happen.  The tile data object is on the heap so perhaps it is accessible
>> after its destruction without a segfault. So even if this happens, perhaps
>> it remains hidden. The null data pointer means the tile was swapped out so
>> perhaps the program can't notice if it's using a deleted tile. I don't
>> really know what happens on the heap if you delete an object twice.
>>
> If you delete the object twice, there will be a crash with 99% probability.
> We use a boost::pool for allocating tile data objects. I guess it has the
> same guards from it.
>
>
>> To be sure about it, I think the following solution may helps: let the
>> initial value of the reference counter be 1 (I think it already becomes one
>> right after its creation). Then if the ref() call encounters a zero
>> reference counter it means the tile was deleted so it can drop an error
>> message. It is not a real overhead since it only checks the reply value of
>> the Qt ref() call. Then we can see if this bug happens or not.
>>
> You can check it by adding Q_ASSERT into  KisTileData::deref() like this:
>
> inline bool KisTileData::deref() {
>      bool _ref;
>
>      if (!(_ref = m_refCount.deref())) {
>          m_store->freeTileData(this);
>          return 0;
>      }
>
>      Q_ASSERT(_ref > 0);
>
>      return _ref;
> }
>
> But this code is for testing only, not to be pushed into master.
>
>
>
>> Fazek
>>
>>
>>
>> 2016-03-22 07:54 keltezéssel, Dmitry Kazakov írta:
>>
>> Hi, Fazekas!
>>
>>
>> I know it's dangerous. In my opinion Krita is still very unstable so I can
>>
>> totally imagine this bug happens sometimes. There are just too many crashes.
>>
>>
>> For the ref() and deref() calls, I still think it's not a safe way to
>> allocate and delete things. Because there is an unprotected time gap
>> between accessing and protecting the tile. You can build extra protection
>> around this, but multi threading is very tricky sometimes. I know these
>> calls must be highly regulated by the lockings, but I'm also sure the
>> deref() call is not protected - it cant be - by the lock I tried to avoid
>> in my second commit.
>>
>>
>> Well, this code is here for almost five years now and there was not a
>> single backtrace pointing to this place. Could you please provide *at least
>> theoretical* sequence of multithreaded calls that could cause a problem
>> here? Most of the acquire() and ref() calls happen when at least one object
>> (mostly (*this) of a KisTile) still owns at least one ref+userCount to the
>> KisTileData. Or (*this) of KisMementoItem holds the ref counter. I know
>> that this is more a C-way of coding, but take care, that code is called
>> million times per second when working with 10k images. So calling ref(),
>> deref() on every function call is not a go. Memory barrier is not a cheap
>> operation, so we should avoid it if possible.
>>
>> I made some tests to see how the different threads are calling these
>>
>> functions and it happens very often in an asynchronous way.
>>
>>
>> Yes, this part is called from multiple threads all the time.
>>
>>
>>
>>
>> Perhaps I can make a test to force this bug for you, but it's not easy,
>> since it would happen only if the threads are synchronized exactly right.
>> And my brain hurts if I'm trying to imagine what happening in the different
>> threads simultanously. So for now, I used the fact that a solution without
>> time gaps is surely better.
>>
>>
>> I cannot understand what "gaps" you mean. Please elaborate a bit, then we
>> will be able to find a solution if needed. Doing a test might be a bit
>> difficult, but if you decide to do it, you can use
>> KisTiledDataManagerTest::stressTest() for a base.
>>
>>
>>
>>
>> I noticed the test directory, but I don't know how to include the
>> unittests for building so I left the test part untouched. But I'd like to
>> make the test codes too, even for my csv plugin so how can I build the
>> unittests actually?
>>
>>
>> You should pass -DBUILD_TESTING=ON to a cmake command.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Krita mailing listkimageshop at kde.orghttps://mail.kde.org/mailman/listinfo/kimageshop
>>
>>
>>
>> _______________________________________________
>> Krita mailing list
>> kimageshop at kde.org
>> https://mail.kde.org/mailman/listinfo/kimageshop
>>
>>
>
>
>
> _______________________________________________
> Krita mailing list
> kimageshop at kde.org
> https://mail.kde.org/mailman/listinfo/kimageshop

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kimageshop/attachments/20160322/27db8643/attachment.html>


More information about the kimageshop mailing list