T7846: akonadi > MySQL ERD Review

Fri Feb 2 11:51:36 GMT 2018

dvratil added a comment.

  > [partTypeTable]
  > 
  > The Alternate Key is on (name, ns). Is it necessary to use both columns?

  Yes, type names have two parts: a //namespace// and a //name// and as such the combination of those two has to be unique, which is what the key enforces. //Namespace// determines the type of the part (generally it's either "PLD" for payloads or "ATR" for attributes) and //name// is the actual name of the type. The reason for the split is that sometimes you want to query for example all attributes belonging to an Item, in which case you can do a query like

    SELECT * FROM PartTable LEFT JOIN PartTypeTable ON PartTypeTable.id = PartTable.partTypeId WHERE PartTypeTable.ns = 'ATR' AND PartTable.pimItemId = ?;

  > Associative Tables do not have a Surrogate Key - slight anti-pattern

  What would be the benefit of having a Surrogate key for those relational tables? The reason for the composite PK is to enforce uniqueness of the value pairs in those tables, and from our point of view, the value pair is the unique identifier of the relation. There is no need to be able to refer to such relations by a single identifier.

  > What I recommend we (eventually) do is:
  > 
  > - Ensure we can drop the existing, composite Primary Key
  > - Add a surrogate id to each table and make it the Primary Key
  > - Create separate indexes on each Foreign Key

  How about we only create separate indexes on each Foreign Key to make work easier for the planner, but we keep the composite PK (or turn it into a regular UNIQUE INDEX)? The alternative is introducing the surrogate id as a PK and adding a UNIQUE index on the Foreign Key pairs to enforce uniqueness but that seems like a lot of pointless overhead as the surrogate id would never be used and we would be introducing another index on potentially large tables.

  > [CollectionTable]
  >  I would investigate further whether the index on (name, parentId) is necessary. It is possible it's necessary if we have an extremely high fan-out from the parent to the next level of children (e.g. 20, 50, 100 next-level children)?

  The index is UNIQUE, thus enforcing that we don't end up with multiple Collections with the same name and the same parent (same as you can't have multiple subfolders with the same name in one folder).

  > Generally indexes on low cardinality columns (e.g. gender) are not helpful. Unless of course one is fetching only by that column and perhaps a small percentage of the data qualifies by the value. With this in mind, I would investigate if we need indexes on enabled, syncPref, displayPref and indexPref. I suspect we should be able to drop them. I'm curious to know the SQL which uses these columns. I suspect other columns are used which would have existing supporting indexes.

  Usually, when client requests a Collection, those columns are used to filter the results by:

    SELECT * FROM CollectionTable .... WHERE (CollectionTable.enabled = 1 OR CollectionTable.referenced = 1) AND ....

  Now that I think about it, the variance in those columns is rather low (all the *Pref columns usually have value of 2) in which case I guess the index is indeed not very useful. For the majority of users, even the "enabled" column is true everywhere.

TASK DETAIL
  https://phabricator.kde.org/T7846

To: pablo, dvratil
Cc: dvratil, kde-pim, pablo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20180202/48e96da7/attachment.html>