[Kde-pim] akonadi indexing agent: Where does it check on what folders to index?

Daniel Vrátil dvratil at kde.org
Thu Feb 11 14:09:36 GMT 2016


On Thursday, February 11, 2016 2:37:31 PM CET Martin Steigerwald wrote:
> Am Donnerstag, 11. Februar 2016, 13:08:19 CET schrieb Daniel Vrátil:
> > On Thursday, February 11, 2016 12:22:29 PM CET Martin Steigerwald wrote:
> > > (asking new devel related questions here instead of continuing on
> > > kdepim-
> > > users)
> > > 
> > > Hello!
> > 
> > Hi!
> > 
> > > I am trying to get a clue on where it selects which folders to index?
> > > For
> > > my huge maildir it can help to tell it not to index the large LKML
> > > folders and some other large folders, and there is setting in KMail
> > > folder properties for that in the maintenance tab. "Enable full text
> > > indexing" which I disabled for the LKML folders.
> > > 
> > > I am not sure whether Akonadi indexing agent actually uses that setting
> > > tough.
> > 
> > Indeed the agent completely ignores the settings :(
> > 
> > > I think it schedules indexing of folders in
> > > 
> > >  47 Scheduler::Scheduler(Index &index, const QSharedPointer<JobFactory>
> > > 
> > > &jobFactory, QObject *parent)
> > > […]
> > > 
> > >  64     //Schedule collections we know have missing items from last time
> > >  65     m_dirtyCollections = group.readEntry("dirtyCollections",
> > > 
> > > QList<Akonadi::Collection::Id>()).toSet();
> > > 
> > >  66     qCDebug(AKONADI_INDEXER_AGENT_LOG) << "Dirty collections " <<
> > > 
> > > m_dirtyCollections;
> > > 
> > >  67     Q_FOREACH (Akonadi::Collection::Id col, m_dirtyCollections) {
> > >  68         scheduleCollection(Akonadi::Collection(col), true);
> > >  69     }
> > >  70
> > >  71     //Trigger a full sync initially
> > >  72     if (!group.readEntry("initialIndexingDone", false)) {
> > >  73         qCDebug(AKONADI_INDEXER_AGENT_LOG) << "initial indexing";
> > >  74         QMetaObject::invokeMethod(this, "scheduleCompleteSync",
> > > 
> > > Qt::QueuedConnection);
> > > 
> > >  75     }
> > >  76     group.writeEntry("initialIndexingDone", true);
> > >  77     group.sync();
> > >  78 }
> > > 
> > > of "scheduler.cpp", yet I don´t see any check for whether fulltext
> > > indexing
> > > is enabled for the folder or not. Also I didn´t see a check in
> > > scheduleCollection().
> > > 
> > > Do I miss something obvious?
> > 
> > You are right, but there are more places where the check is missing
> > 
> > > If it is missing, maybe a junior job for me can be to add it.
> > 
> > That would be awesome!
> > 
> > All you need to do is to check whether a collection has
> > Akonadi::IndexPolicyAttribute set and what's the value of
> > indexingEnabled().
> I do so in two places, I think the first is not really necessary, but it
> might safe some unnecessary code execution – also its partly guess work.
> 
> > One part is adding the check to Scheduler::scheduleCollection, as you
> > already found. The other part is little more tricky, because we also want
> > to ignore changes happening to and within those Collections, like renaming
> > the Collection or new Item being added or Item being changed (marked as
> > read, etc. etc). Right now we just get the changed or new Item/Collection,
> > and index it straight away.
> 
> For now I just changed addItem in scheduler.cpp as that is what agent.cpp
> methods call, but for removing and moving I didn´t anything. Maybe it would
> be even good to allow renaming or moving to work in case the collection has
> been indexed before the user said no.
> 
> Well at least for initial sync it appears to work. I will see after initial
> sync is complete, so far I didn´t see any collection I disabled full text
> indexing for in AkonadiConsole index agent status display.
> 
> Next I will look into creating a review request with Phabricator. I do have
> an identity account with commit rights, but I didn´t use Phabricator so far.
> I hope its easy to make a review request with it.

If you prefer ReviewBoard, just use ReviewBoard.

> I also wonder about dirty collection, somewhere in scheduler.cpp it writes
> them out to a config file, I bet it shouldn´t add collections in there which
> are not enabled for fulltext indexing.

Right, but then again we filter those out in scheduleCollection(), so no 
problem there.

> 
> I know its not a review request, but this is what I changed. And so far this
> first hack appears to do what I want :)
> 
> diff --git a/agent/scheduler.cpp b/agent/scheduler.cpp
> index c912468..42828de 100644
> --- a/agent/scheduler.cpp
> +++ b/agent/scheduler.cpp
> @@ -26,6 +26,7 @@
>  #include <AkonadiCore/CollectionFetchScope>
>  #include <AkonadiAgentBase/AgentBase>
>  #include <AkonadiCore/ServerManager>
> +#include <AkonadiCore/indexpolicyattribute.h>
>  #include <KLocalizedString>
>  #include <KConfigGroup>
>  #include <QTimer>
> @@ -105,18 +106,30 @@ void Scheduler::collectDirtyCollections()
> 
>  void Scheduler::scheduleCollection(const Akonadi::Collection &col, bool
> fullSync) {
> -    if (!m_collectionQueue.contains(col.id())) {
> -        m_collectionQueue.enqueue(col.id());
> -    }
> -    if (fullSync) {
> -        m_dirtyCollections.insert(col.id());
> +    //Only index the collection if user did not disable full text indexing
> in folder properties dialog in KMail +    Akonadi::IndexPolicyAttribute
> *attr = col.attribute<Akonadi::IndexPolicyAttribute>(); +    if (!attr ||
> attr->indexingEnabled()) {
> +        if (!m_collectionQueue.contains(col.id())) {
> +            m_collectionQueue.enqueue(col.id());
> +        }
> +        if (fullSync) {
> +            m_dirtyCollections.insert(col.id());
> +        }
>      }
> +
>      processNext();
>  }

This looks OK.


>  void Scheduler::addItem(const Akonadi::Item &item)
>  {
>      Q_ASSERT(item.parentCollection().isValid());
> +
> +    //Only index the item if user did not disable full text indexing for
> its collection in folder properties dialog in KMail 
> + Akonadi::IndexPolicyAttribute *attr =
> item.parentCollection().attribute<Akonadi::IndexPolicyAttribute>();

Unfortunately this won't work just like that, the information available about 
parent Collection is limited and it does not include attributes. It *might* 
work, if the Collection in some of the lower caches has the information, but 
you cannot rely on it.

Since we cannot afford fetching full parent Collection for each Item I guess 
we will need some local cache of Collections (IDs) with disabled indexing and 
just quickly check if Item's belongs to that collection or not.

The cache can easilly be kept up-to-date from the 
collection{Added,Changed,Removed} handlers, just need to make sure it's 
populated on start.

> +    if (!attr || !attr->indexingEnabled()) {

Should be 

	if (attr && !attr->indexingEnabled())


> +        return;
> +    }
> +

Dan

> Thanks,


-- 
Daniel Vrátil
www.dvratil.cz | dvratil at kde.org
IRC: dvratil on Freenode (#kde, #kontact, #akonadi, #fedora-kde)

GPG Key: 0x4D69557AECB13683
Fingerprint: 0ABD FA55 A4E6 BEA9 9A83 EA97 4D69 557A ECB1 3683
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20160211/5b03fca5/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list