[Akonadi] [Bug 451620] akonadictl; sqlite: Error moving orphan items to collection 242 : Expression tree is too large (maximum depth 1000)

Sun Mar 20 12:37:32 GMT 2022

https://bugs.kde.org/show_bug.cgi?id=451620

--- Comment #1 from Erik Quaeghebeur <bugs.kde.org at e3q.eu> ---
I have investigated a bit further. Using a GUI for sqlite (sqlitebrowser), I
familiarized myself a bit with the database and then looked into what was going
on here:

---
select distinct "collectionId" as "id" from "PimItemTable"
except
select distinct "id" from "CollectionTable"
order by "id";
---
This returned 52 collections that where referenced by items, but apparently do
not exist anymore.

---
select "id" as "itemId", "collectionId" from "PimItemTable"
where "collectionId" not in (select distinct "id" from "CollectionTable")
order by "collectionId";
---
This returned 110689 items, the same as reported by akonadictl, so indeed these
were the orphans mentioned.

---
delete from "PimItemTable"
where "collectionId" not in (select distinct "id" from "CollectionTable");
---
To fix the issue, I just removed them rather than creating a lost+found(?)
collection (with id 242?) and updating the collectionId of the items to 242.
After vacuuming (‘compress database’ under ‘Extra’ menu in sqlitebrowser), my
database shrunk to less than half the size it had before (600+ to ~250 MB).

So what seems to go wrong is that in the code *the list of orphans* resulting
from a first query is used to build a second query to change their
collectionId. Because this list is absurdly long, sqlite bails out. A possible
fix would be to first find out if there are orphan collections (my first query
listed) and if so, create as needed the lost+found collection and update the
corresponding items in the way done by my last query above.

Next, I dived into the code. I think the function in question can be found at
https://invent.kde.org/pim/akonadi/-/blob/master/src/server/storagejanitor.cpp#L277.
There, one can indeed see that a list of orphans is created on the C++ side (in
the variable imapIds) and subsequently used to create the query to clean up the
orphans. While the Qt functions for creating queries are mostly gibberish to
me, what I can deduce, is that this is done in a roundabout, inefficient way,
as compared to what my queries above do. Is there a reason for this?

Finally, a point that must be addressed is why such orphans can even exist, as
in the database schema for the creation of PimItemTable, I see:

…
collectionId BIGINT
…
CONSTRAINT PimItemTablecollectionId_Collectionid_fk FOREIGN KEY (collectionId)
REFERENCES CollectionTable(id) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE
INITIALLY DEFERRED
…

which should in principle cascade the deletion of the collection to the
deletion of items within that collection. I do not know what could have gone
wrong, but it is worrying.

N.B.: The above column definition+constraint can more compactly be done as

collectionId BIGINT REFERENCES CollectionTable(id) ON UPDATE CASCADE ON DELETE
CASCADE DEFERRABLE INITIALLY DEFERRED

using the concept of column constraint:
https://www.sqlite.org/syntax/column-constraint.html.

-- 
You are receiving this mail because:
You are the assignee for the bug.