[Kde-pim] Review Request: Add indexing throttling and fixed endless indexing problems

Sebastian Trueg strueg at mandriva.com
Fri Mar 9 18:51:05 GMT 2012


On 03/09/2012 07:38 PM, Vishesh Handa wrote:
>
>
> On Fri, Mar 9, 2012 at 9:08 PM, Sebastian Trueg <strueg at mandriva.com
> <mailto:strueg at mandriva.com>> wrote:
>
>     On 03/09/2012 04:24 PM, Volker Krause wrote:
>     > On Thursday 08 March 2012 20:24:16 Sebastian Trueg wrote:
>     >> On 03/08/2012 05:33 PM, Volker Krause wrote:
>     >>> On Tuesday 06 March 2012 07:49:55 Sebastian Trueg wrote:
>     >>>> On 03/05/2012 04:49 PM, Volker Krause wrote:
>     >>>>> On Sunday 26 February 2012 12:33:49 Sebastian Trueg wrote:
>     >>>>>> Since I am unable to reproduce this slow query - it seems
>     that I am
>     >>>>>> missing data which results in such a query - could you please:
>     >>>>>>
>     >>>>>> * apply the attached patch to
>     kde-runtime/nepomuk/services/backupsync
>     >>>>>> (-p3)
>     >>>>>> * shutdown akonadi and nepomuk, remove the nepomuk db (or
>     move it to a
>     >>>>>> backup location)
>     >>>>>> * start the storage service manually: "nepomukservicestub
>     >>>>>> nepomukstorage
>     >>>>>> 2> /tmp/mytmpfile"
>     >>>>>> * start akonadi, let it finish its indexing, even if
>     Virtuoso goes wild
>     >>>>>> * finally get the hardest query via: "grep XXXXX
>     /tmp/mytmpfile|sed
>     >>>>>> "s/.*XXXXX //g"|sort -n|uniq|tail -20".
>     >>>>>>
>     >>>>>> Thanks for the help.
>     >>>>> 20420633 sparql select distinct ?r count(?p) as ?cnt where {
>     ?r ?p ?o.
>     >>>>> filter( ?p in (<ht
>     >>>>> 11627450 sparql select distinct ?r count(?p) as ?cnt where {
>     ?r ?p ?o.
>     >>>>> filter( ?p in (<ht
>     >>>>> 2 status()
>     >>>>> 4850172 sparql select distinct ?r count(?p) as ?cnt where {
>     ?r ?p ?o.
>     >>>>> filter( ?p in (<ht
>     >>>>> 5972876 sparql select distinct ?r count(?p) as ?cnt where {
>     ?r ?p ?o.
>     >>>>> filter( ?p in (<ht
>     >>>>>
>     >>>>> doesn't look like this is working :-(
>     >>>>>
>     >>>>> I ended up with 4 threads with infinite queries now, but no
>     output, as
>     >>>>> they
>     >>>>> never finish (I see the output for the fast queries, so it's
>     working in
>     >>>>> general).
>     >>>> I see. If the problem persists even if the patch I gave you
>     yesterday
>     >>>> evening, could you please put the debug statement before the
>     query
>     >>>> execution instead?
>     >>>> If my patch works, could you please collect the output so I
>     have some
>     >>>> statistics?
>     >>> looks very good :)
>     >>>
>     >>> For the last two days I did not end up with a hanging Virtuoso
>     anymore,
>     >>> and it looks like the Akonadi indexing actually managed to
>     finish this
>     >>> time. So, definitely something we should commit :)
>     >> great. thanks for testing.
>     >>
>     >>> I'll investigate the resulting data more closely during the
>     weekend, but
>     >>> I've already spotted some things that still need fixing (but
>     that's
>     >>> largely unrelated to the query performance problem, it'll
>     likely require
>     >>> another full re-indexing though, which should finally confirm
>     that this
>     >>> is fixed). Besides the problems with nco:Contact merging already
>     >>> described by Will, I noticed that some of those nco:Contacts
>     also have a
>     >>> large amount of hasEmailAddress properties, so we seem to only
>     merge the
>     >>> contact objects, but not the objects they refer to. My attempts on
>     >>> indexing attachment content (still commented out) also didn't
>     generate
>     >>> the desired results yet (actually, no results at all, I can't
>     find the
>     >>> content objects anywhere, only the meta-information we add in the
>     >>> feeder).
>     >> Is there a branch which I can have a look at?
>     > It's in master,
>     agents/nepomukfeeder/plugins/nepomukmailfeeder.cpp:162 in
>     > kdepim-runtime. The actual code of indexData() has been
>     recovered from the
>     > pre-DMS feeder, I'd assume something is wrong there (it uses the
>     > nepomukindexer tool and feeds the attachment via stdin). When
>     doing that
>     > manually I fail to find the result in Nepomuk as well, while it
>     works when
>     > pointing it to a file. The attachment resource we create before
>     indexing the
>     > content (the not commented out part before line 162) works fine.
>     >
>     > regards,
>     > Volker
>     I think that is because nepomukindexer simply cannot handle stdin
>     properly anymore. Vishesh?
>
>
> eh? Works for me.
>
> cat someFile | nepomukindexer --uri '/home/vishesh/someFile'
>
> In the case of PIM, the uri will have to be the actual uri in the form
> of nepomuk:/res/some-uuid.
>
There that is probably the problem. Volker, do you specify the actual
Nepomuk URI or do you provide the akonadi URL?
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/



More information about the kde-pim mailing list