[Kde-pim] Akonadi db schema change needed to store search folder queryStrings

Tue May 1 19:34:32 BST 2012

Hey,

I noticed that I still get truncated queries, even after the size of the 
VARBINARY has been increased to 1024. Here's one example query that failed for 
me:

"/usr/bin/nepomukservicestub(1217)" Soprano: "SQLExecDirect failed on query 
'sparql select distinct ?r ?reqProp1 
(bif:concat(bif:search_excerpt(bif:vector('Helvetia*'), 
?v7),bif:search_excerpt(bif:vector('Helvetia*'), 
?v8),bif:search_excerpt(bif:vector('Helvetia*'), 
?v10),bif:search_excerpt(bif:vector('Helvetia*'), ?v6))) as ?_n_f_t_m_ex_ 
where { { ?r <http://akonadi-project.org/ontologies/aneo#akonadiItemId> 
?reqProp1 . ?r 
<http://www.semanticdesktop.org/ontologies/2007/01/19/nie#isPartOf> 
<nepomuk:/res/dc80e69e-a505-465e-ad94-e387c0326c6d> . { ?r 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#bcc> ?v2 . ?v2 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#cc> ?v3 . ?v3 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#to> ?v4 . ?v4 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress> ?v5 
. ?v5 <http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress> 
?v6 . FILTER(bif:contains(?v6, "'Helvetia*'")) . } UNION { ?r 
<http://www.semanticdesktop.org/ontologies/2007/03/22/nmo#messageSubject> ?v7 
. FILTER(bif:contains(?v7, "'Helvetia*'' (iODBC Error: [OpenLink][Virtuoso 
iODBC Driver][Virtuoso Server]SQ074: Line 1: Unterminated SPARQL short double-
quoted string at '')"

As you can see, while the query could possibly be optimized, it's easy to get 
beyond 1024 bytes with a reasonable query. In any case just hoping that it 
won't happen isn't a solution, so I think we need to change the schema again.

Ideally I would suggest to use a separate table for queries, as there is only  
a very small subset of collections actually having a query associated (for me 
it's 1), basically solving the problem as we can use even a large varchar.

If that is not possible, or there is a good reason not to do so, we have the 
option of TEXT types, but I don't know if that works with other backends than 
mysql.

The last option would be to use a larger VARBINARY i.e. 10240.
While VARBINARY(10240) is cheap on disk (uses only the space that is actually 
used to store the string), it uses the maximum value in memory, but even with 
1000 collections it would amount to only approx. 1MB of memory. In this case 
we should also add a check in the akonadi server if the query reaches the 
maximum lenght to keep virtuoso from going crazy by truncated queries.

If you see nothing speaking against it I would look into adding a 
collectionquery table.

Cheers,
Christian

On Friday 24 February 2012 15.50:02 Milian Wolff wrote:
> On Friday 24 February 2012 14:18:35 Will Stephenson wrote:
> > On Friday 24 Feb 2012 13:28:06 you wrote:
> > > On Friday 24 February 2012 09:52:37 Will Stephenson wrote:
> > > > On Thursday 23 Feb 2012 22:26:49 Shaheed Haque wrote:
> > > > > Is the extra storage really a concern? I suppose the pimitemtable is
> > > > > probably going to have most entries by far. That seems to have a
> > > > > handful of columns which might be strings (and 9 in total). Even if
> > > > > I
> > > > > had a million records, we are talking a handful of MB.
> > > > > 
> > > > > That seems perfectly reasonable...Or have I misunderstood?
> > > > 
> > > > Multiplying the number of rows by the number of QString columns and
> > > > summing
> > > > it across all tables gives me (2 imap accounts, about 2.5GB mail in
> > > > total
> > > > on the servers) an additional 1455755 bytes for the storage. Then I
> > > > assume
> > > > indexed columns count double, which gives another 872906 * 2 bytes as
> > > > there
> > > > are 2 indices on parttable.name (which contains a lot of repeated
> > > > entries,
> > > > perhaps it would make sense to normalize this out into its own
> > > > table?).
> > > > 
> > > > That's 3201567 extra bytes in theory.  I'm not a database guy so don't
> > > > know
> > > > how that translates into real extra storage, memory footprint (indexes
> > > > stay
> > > > in memory, right?) and performance.  Can someone better informed
> > > > comment?
> > > 
> > > VARBINARY has variable length, hence it will only use as much as it
> > > actually needs. The number you pass it is the maximum size.
> > 
> > I'm on kde-pim@, no need to CC me.
> 
> bad habit (answer vs answer to mailing list shortcut ;-)) - sorry
> 
> > In my reading of the manual, VARBINARY has the same storage requirements
> > as
> > VARCHAR, which means, it requires length(data) bytes, plus one byte for
> > len
> > <=255, else plus 2 bytes.  It's in the manual.  I suppose it uses either a
> > byte or a short for the field length.
> 
> doh - I should have read your initial mails in more detail :) you are right
> of course. Anyhow, from a DB-pov it doesn't really make sense to put *all*
> QString into a VARCHAR imo. Rather, it depends on the actual size. A field
> that stores a query should imo be one of the TEXT types. For fields that
> store some often-used lookup value (remoteid, name, etc. come to mind), a
> VARCHAR would of course be better for performance reasons, yet one must
> ensure that the values stored are really never larger than what can be
> stored in the DB. Such silent problems like you describe here might happen
> in other places as well - yikes!
> 
> Oh and your 3201567 Bytes don't sound that much to me I have to admit...
> It's just 3MB - who cares?
> 
> PS: see also
> http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html
> http://forums.mysql.com/read.php?24,105964,105964#msg-105964 and
> http://stackoverflow.com/questions/2023481/mysql-large-varchar-vs-text
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/