[Kde-pim] akonadi indexing agent: Where does it check on what folders to index?

Martin Steigerwald martin at lichtvoll.de
Thu Feb 11 15:27:19 GMT 2016


Am Donnerstag, 11. Februar 2016, 15:09:36 CET schrieb Daniel Vrátil:
> On Thursday, February 11, 2016 2:37:31 PM CET Martin Steigerwald wrote:
> > Am Donnerstag, 11. Februar 2016, 13:08:19 CET schrieb Daniel Vrátil:
> > > On Thursday, February 11, 2016 12:22:29 PM CET Martin Steigerwald wrote:
> > > > (asking new devel related questions here instead of continuing on
> > > > kdepim-
> > > > users)
> > > > 
> > > > Hello!
> > > 
> > > Hi!
> > > 
> > > > I am trying to get a clue on where it selects which folders to index?
> > > > For
> > > > my huge maildir it can help to tell it not to index the large LKML
> > > > folders and some other large folders, and there is setting in KMail
> > > > folder properties for that in the maintenance tab. "Enable full text
> > > > indexing" which I disabled for the LKML folders.
> > > > 
> > > > I am not sure whether Akonadi indexing agent actually uses that setting
> > > > tough.
> > > 
> > > Indeed the agent completely ignores the settings :(
> > > 
> > > > I think it schedules indexing of folders in
> > > > 
> > > >  47 Scheduler::Scheduler(Index &index, const QSharedPointer<JobFactory>
> > > > 
> > > > &jobFactory, QObject *parent)
> > > > […]
> > > > 
> > > >  64     //Schedule collections we know have missing items from last time
> > > >  65     m_dirtyCollections = group.readEntry("dirtyCollections",
> > > > 
> > > > QList<Akonadi::Collection::Id>()).toSet();
> > > > 
> > > >  66     qCDebug(AKONADI_INDEXER_AGENT_LOG) << "Dirty collections " <<
> > > > 
> > > > m_dirtyCollections;
> > > > 
> > > >  67     Q_FOREACH (Akonadi::Collection::Id col, m_dirtyCollections) {
> > > >  68         scheduleCollection(Akonadi::Collection(col), true);
> > > >  69     }
> > > >  70
> > > >  71     //Trigger a full sync initially
> > > >  72     if (!group.readEntry("initialIndexingDone", false)) {
> > > >  73         qCDebug(AKONADI_INDEXER_AGENT_LOG) << "initial indexing";
> > > >  74         QMetaObject::invokeMethod(this, "scheduleCompleteSync",
> > > > 
> > > > Qt::QueuedConnection);
> > > > 
> > > >  75     }
> > > >  76     group.writeEntry("initialIndexingDone", true);
> > > >  77     group.sync();
> > > >  78 }
> > > > 
> > > > of "scheduler.cpp", yet I don´t see any check for whether fulltext
> > > > indexing
> > > > is enabled for the folder or not. Also I didn´t see a check in
> > > > scheduleCollection().
> > > > 
> > > > Do I miss something obvious?
> > > 
> > > You are right, but there are more places where the check is missing
> > > 
> > > > If it is missing, maybe a junior job for me can be to add it.
> > > 
> > > That would be awesome!
> > > 
> > > All you need to do is to check whether a collection has
> > > Akonadi::IndexPolicyAttribute set and what's the value of
> > > indexingEnabled().
> > I do so in two places, I think the first is not really necessary, but it
> > might safe some unnecessary code execution – also its partly guess work.
> > 
> > > One part is adding the check to Scheduler::scheduleCollection, as you
> > > already found. The other part is little more tricky, because we also want
> > > to ignore changes happening to and within those Collections, like renaming
> > > the Collection or new Item being added or Item being changed (marked as
> > > read, etc. etc). Right now we just get the changed or new Item/Collection,
> > > and index it straight away.
> > 
> > For now I just changed addItem in scheduler.cpp as that is what agent.cpp
> > methods call, but for removing and moving I didn´t anything. Maybe it would
> > be even good to allow renaming or moving to work in case the collection has
> > been indexed before the user said no.
> > 
> > Well at least for initial sync it appears to work. I will see after initial
> > sync is complete, so far I didn´t see any collection I disabled full text
> > indexing for in AkonadiConsole index agent status display.
> > 
> > Next I will look into creating a review request with Phabricator. I do have
> > an identity account with commit rights, but I didn´t use Phabricator so far.
> > I hope its easy to make a review request with it.
> 
> If you prefer ReviewBoard, just use ReviewBoard.

I think I want to learn the tools the kdepim people agreed upon, so I will
look into Phabricator. Hope its documented somewhere. Hmmm, seems some
documentation is at

https://techbase.kde.org/Development/Phabricator


> 
> > I also wonder about dirty collection, somewhere in scheduler.cpp it writes
> > them out to a config file, I bet it shouldn´t add collections in there which
> > are not enabled for fulltext indexing.
> 
> Right, but then again we filter those out in scheduleCollection(), so no 
> problem there.
> 
> > 
> > I know its not a review request, but this is what I changed. And so far this
> > first hack appears to do what I want :)
> > 
> > diff --git a/agent/scheduler.cpp b/agent/scheduler.cpp
> > index c912468..42828de 100644
> > --- a/agent/scheduler.cpp
> > +++ b/agent/scheduler.cpp
> > @@ -26,6 +26,7 @@
> >  #include <AkonadiCore/CollectionFetchScope>
> >  #include <AkonadiAgentBase/AgentBase>
> >  #include <AkonadiCore/ServerManager>
> > +#include <AkonadiCore/indexpolicyattribute.h>
> >  #include <KLocalizedString>
> >  #include <KConfigGroup>
> >  #include <QTimer>
> > @@ -105,18 +106,30 @@ void Scheduler::collectDirtyCollections()
> > 
> >  void Scheduler::scheduleCollection(const Akonadi::Collection &col, bool
> > fullSync) {
> > -    if (!m_collectionQueue.contains(col.id())) {
> > -        m_collectionQueue.enqueue(col.id());
> > -    }
> > -    if (fullSync) {
> > -        m_dirtyCollections.insert(col.id());
> > +    //Only index the collection if user did not disable full text indexing
> > in folder properties dialog in KMail +    Akonadi::IndexPolicyAttribute
> > *attr = col.attribute<Akonadi::IndexPolicyAttribute>(); +    if (!attr ||
> > attr->indexingEnabled()) {
> > +        if (!m_collectionQueue.contains(col.id())) {
> > +            m_collectionQueue.enqueue(col.id());
> > +        }
> > +        if (fullSync) {
> > +            m_dirtyCollections.insert(col.id());
> > +        }
> >      }
> > +
> >      processNext();
> >  }
> 
> This looks OK.

It is not complete however, as I now found it indexing mesa-dev-ml and I
told it not to.

> >  void Scheduler::addItem(const Akonadi::Item &item)
> >  {
> >      Q_ASSERT(item.parentCollection().isValid());
> > +
> > +    //Only index the item if user did not disable full text indexing for
> > its collection in folder properties dialog in KMail 
> > + Akonadi::IndexPolicyAttribute *attr =
> > item.parentCollection().attribute<Akonadi::IndexPolicyAttribute>();
> 
> Unfortunately this won't work just like that, the information available about 
> parent Collection is limited and it does not include attributes. It *might* 
> work, if the Collection in some of the lower caches has the information, but 
> you cannot rely on it.
> 
> Since we cannot afford fetching full parent Collection for each Item I guess 
> we will need some local cache of Collections (IDs) with disabled indexing and 
> just quickly check if Item's belongs to that collection or not.
> 
> The cache can easilly be kept up-to-date from the 
> collection{Added,Changed,Removed} handlers, just need to make sure it's 
> populated on start.

Hmmmm, sounds a bit more complicated then.

> 
> > +    if (!attr || !attr->indexingEnabled()) {
> 
> Should be 
> 
> 	if (attr && !attr->indexingEnabled())

I now have:

diff --git a/agent/scheduler.cpp b/agent/scheduler.cpp
index c912468..cf11f04 100644
--- a/agent/scheduler.cpp
+++ b/agent/scheduler.cpp
@@ -26,6 +26,7 @@
 #include <AkonadiCore/CollectionFetchScope>
 #include <AkonadiAgentBase/AgentBase>
 #include <AkonadiCore/ServerManager>
+#include <AkonadiCore/indexpolicyattribute.h>
 #include <KLocalizedString>
 #include <KConfigGroup>
 #include <QTimer>
@@ -105,12 +106,17 @@ void Scheduler::collectDirtyCollections()
 
 void Scheduler::scheduleCollection(const Akonadi::Collection &col, bool fullSync)
 {
-    if (!m_collectionQueue.contains(col.id())) {
-        m_collectionQueue.enqueue(col.id());
-    }
-    if (fullSync) {
-        m_dirtyCollections.insert(col.id());
+    //Only index the collection if user did not disable full text indexing in folder properties dialog in KMail
+    Akonadi::IndexPolicyAttribute *attr = col.attribute<Akonadi::IndexPolicyAttribute>();
+    if (!attr || attr->indexingEnabled()) {
+        if (!m_collectionQueue.contains(col.id())) {
+            m_collectionQueue.enqueue(col.id());
+        }
+        if (fullSync) {
+            m_dirtyCollections.insert(col.id());
+        }
     }
+
     processNext();
 }
 
@@ -218,6 +224,13 @@ void Scheduler::processNext()
     }
 
     const Akonadi::Collection col(m_collectionQueue.takeFirst());
+
+    //Only index the collection if user did not disable full text indexing in folder properties dialog in KMail
+    Akonadi::IndexPolicyAttribute *attr = Akonadi::Collection(col).attribute<Akonadi::IndexPolicyAttribute>();
+    if (attr && !attr->indexingEnabled()) {
+        return;
+    }
+
     qCDebug(AKONADI_INDEXER_AGENT_LOG) << "Processing collection: " << col.id();
     QQueue<Akonadi::Item::Id> &itemQueue = m_queues[col.id()];
     const bool fullSync = m_dirtyCollections.contains(col.id());


Yet it again still indexes collections it should not.

Seems I need to look at it more closely.

Ideally the decision whether a collection or an item is indexes or removed
from the index would be made at *one* place in the code. And all other places
call into that.

Also when disabling full text indexing for a collection should it remove
it from the index or say "well we indexed it already, so why not keep it?".
I vote for removing it as that is what I would expect.

Thanks,
-- 
Martin
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list