Tile data management changed in Krita

Dmitry Kazakov dimula73 at gmail.com
Tue Mar 22 08:31:43 UTC 2016


On Tue, Mar 22, 2016 at 10:41 AM, Fazekas László <mneko at freemail.hu> wrote:

> Hello Dmitry,
>
> The part I have problem here is the deletion of the tile inside the
> deref() call. See the following case:
>
> 1. There is a tile with a single owner, the owner thread protected it with
> a ref() call.
> The reference counter is 1.
>
> 2. A second thread gets a pointer to the tile to use it. Right after that,
> the task scheduler pauses this thread.
>

Here we need to consider what "single owned" and "other thread" are.

Single owners that release the tile data can be:

A1) KisTile::lockForWrite()
A2) ~KisTile
A3) ~KisMementoItem()

"Other thread" can be:

B1) KisTile::lockForWrite()
B2) KisTile::lockForRead()
B3) KisTile::init()
B4) KisTileDataPooler
B5) KisTileDataSwapper
B6) KisMementoItem::tile()

Cases B4 and B5 are not affected, because they hold a KisTileDataStore
lock. No tile data can be deleted with that.
Case B6 is also not relevant, because m_tileData is already owned by the
memento item.

Combination of A1 and (B1 or B2) is guarded by
KisTile::safeReleaseOldTileData(). No tile data will be released before
*all* the locks are unlocked.

B3 is usually called either within a source tile lock held or for a default
tile data, which is stored in KisTiledDataManager

Combining A2 and A3 with anything from the second list --- I cannot imagine
how it is possible. All tile read/write accesses should be done with a lock
held, so the destructor of a tile can not be called while something else is
still accessing its data.



> I agree this is very unlikely but I think it is theoretically possible.
>

As far as I remember the higher level structures guard us from this scenario


> If there are thousands of ref() and deref() calls per second it can
> happen.  The tile data object is on the heap so perhaps it is accessible
> after its destruction without a segfault. So even if this happens, perhaps
> it remains hidden. The null data pointer means the tile was swapped out so
> perhaps the program can't notice if it's using a deleted tile. I don't
> really know what happens on the heap if you delete an object twice.
>

If you delete the object twice, there will be a crash with 99% probability.
We use a boost::pool for allocating tile data objects. I guess it has the
same guards from it.


> To be sure about it, I think the following solution may helps: let the
> initial value of the reference counter be 1 (I think it already becomes one
> right after its creation). Then if the ref() call encounters a zero
> reference counter it means the tile was deleted so it can drop an error
> message. It is not a real overhead since it only checks the reply value of
> the Qt ref() call. Then we can see if this bug happens or not.
>

You can check it by adding Q_ASSERT into  KisTileData::deref() like this:

inline bool KisTileData::deref() {
    bool _ref;

    if (!(_ref = m_refCount.deref())) {
        m_store->freeTileData(this);
        return 0;
    }

    Q_ASSERT(_ref > 0);

    return _ref;
}

But this code is for testing only, not to be pushed into master.



>
> Fazek
>
>
>
> 2016-03-22 07:54 keltezéssel, Dmitry Kazakov írta:
>
> Hi, Fazekas!
>
>
> I know it's dangerous. In my opinion Krita is still very unstable so I can
>
> totally imagine this bug happens sometimes. There are just too many crashes.
>
>
> For the ref() and deref() calls, I still think it's not a safe way to
> allocate and delete things. Because there is an unprotected time gap
> between accessing and protecting the tile. You can build extra protection
> around this, but multi threading is very tricky sometimes. I know these
> calls must be highly regulated by the lockings, but I'm also sure the
> deref() call is not protected - it cant be - by the lock I tried to avoid
> in my second commit.
>
>
> Well, this code is here for almost five years now and there was not a
> single backtrace pointing to this place. Could you please provide *at least
> theoretical* sequence of multithreaded calls that could cause a problem
> here? Most of the acquire() and ref() calls happen when at least one object
> (mostly (*this) of a KisTile) still owns at least one ref+userCount to the
> KisTileData. Or (*this) of KisMementoItem holds the ref counter. I know
> that this is more a C-way of coding, but take care, that code is called
> million times per second when working with 10k images. So calling ref(),
> deref() on every function call is not a go. Memory barrier is not a cheap
> operation, so we should avoid it if possible.
>
> I made some tests to see how the different threads are calling these
>
> functions and it happens very often in an asynchronous way.
>
>
> Yes, this part is called from multiple threads all the time.
>
>
>
>
> Perhaps I can make a test to force this bug for you, but it's not easy,
> since it would happen only if the threads are synchronized exactly right.
> And my brain hurts if I'm trying to imagine what happening in the different
> threads simultanously. So for now, I used the fact that a solution without
> time gaps is surely better.
>
>
> I cannot understand what "gaps" you mean. Please elaborate a bit, then we
> will be able to find a solution if needed. Doing a test might be a bit
> difficult, but if you decide to do it, you can use
> KisTiledDataManagerTest::stressTest() for a base.
>
>
>
>
> I noticed the test directory, but I don't know how to include the
> unittests for building so I left the test part untouched. But I'd like to
> make the test codes too, even for my csv plugin so how can I build the
> unittests actually?
>
>
> You should pass -DBUILD_TESTING=ON to a cmake command.
>
>
>
>
>
> _______________________________________________
> Krita mailing listkimageshop at kde.orghttps://mail.kde.org/mailman/listinfo/kimageshop
>
>
>
> _______________________________________________
> Krita mailing list
> kimageshop at kde.org
> https://mail.kde.org/mailman/listinfo/kimageshop
>
>


-- 
Dmitry Kazakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kimageshop/attachments/20160322/a97ecc58/attachment-0001.html>


More information about the kimageshop mailing list