[Kde-pim] akonadi indexing agent: Where does it check on what folders to index?
Martin Steigerwald
martin at lichtvoll.de
Thu Feb 11 15:27:19 GMT 2016
Am Donnerstag, 11. Februar 2016, 15:09:36 CET schrieb Daniel Vrátil:
> On Thursday, February 11, 2016 2:37:31 PM CET Martin Steigerwald wrote:
> > Am Donnerstag, 11. Februar 2016, 13:08:19 CET schrieb Daniel Vrátil:
> > > On Thursday, February 11, 2016 12:22:29 PM CET Martin Steigerwald wrote:
> > > > (asking new devel related questions here instead of continuing on
> > > > kdepim-
> > > > users)
> > > >
> > > > Hello!
> > >
> > > Hi!
> > >
> > > > I am trying to get a clue on where it selects which folders to index?
> > > > For
> > > > my huge maildir it can help to tell it not to index the large LKML
> > > > folders and some other large folders, and there is setting in KMail
> > > > folder properties for that in the maintenance tab. "Enable full text
> > > > indexing" which I disabled for the LKML folders.
> > > >
> > > > I am not sure whether Akonadi indexing agent actually uses that setting
> > > > tough.
> > >
> > > Indeed the agent completely ignores the settings :(
> > >
> > > > I think it schedules indexing of folders in
> > > >
> > > > 47 Scheduler::Scheduler(Index &index, const QSharedPointer<JobFactory>
> > > >
> > > > &jobFactory, QObject *parent)
> > > > […]
> > > >
> > > > 64 //Schedule collections we know have missing items from last time
> > > > 65 m_dirtyCollections = group.readEntry("dirtyCollections",
> > > >
> > > > QList<Akonadi::Collection::Id>()).toSet();
> > > >
> > > > 66 qCDebug(AKONADI_INDEXER_AGENT_LOG) << "Dirty collections " <<
> > > >
> > > > m_dirtyCollections;
> > > >
> > > > 67 Q_FOREACH (Akonadi::Collection::Id col, m_dirtyCollections) {
> > > > 68 scheduleCollection(Akonadi::Collection(col), true);
> > > > 69 }
> > > > 70
> > > > 71 //Trigger a full sync initially
> > > > 72 if (!group.readEntry("initialIndexingDone", false)) {
> > > > 73 qCDebug(AKONADI_INDEXER_AGENT_LOG) << "initial indexing";
> > > > 74 QMetaObject::invokeMethod(this, "scheduleCompleteSync",
> > > >
> > > > Qt::QueuedConnection);
> > > >
> > > > 75 }
> > > > 76 group.writeEntry("initialIndexingDone", true);
> > > > 77 group.sync();
> > > > 78 }
> > > >
> > > > of "scheduler.cpp", yet I don´t see any check for whether fulltext
> > > > indexing
> > > > is enabled for the folder or not. Also I didn´t see a check in
> > > > scheduleCollection().
> > > >
> > > > Do I miss something obvious?
> > >
> > > You are right, but there are more places where the check is missing
> > >
> > > > If it is missing, maybe a junior job for me can be to add it.
> > >
> > > That would be awesome!
> > >
> > > All you need to do is to check whether a collection has
> > > Akonadi::IndexPolicyAttribute set and what's the value of
> > > indexingEnabled().
> > I do so in two places, I think the first is not really necessary, but it
> > might safe some unnecessary code execution – also its partly guess work.
> >
> > > One part is adding the check to Scheduler::scheduleCollection, as you
> > > already found. The other part is little more tricky, because we also want
> > > to ignore changes happening to and within those Collections, like renaming
> > > the Collection or new Item being added or Item being changed (marked as
> > > read, etc. etc). Right now we just get the changed or new Item/Collection,
> > > and index it straight away.
> >
> > For now I just changed addItem in scheduler.cpp as that is what agent.cpp
> > methods call, but for removing and moving I didn´t anything. Maybe it would
> > be even good to allow renaming or moving to work in case the collection has
> > been indexed before the user said no.
> >
> > Well at least for initial sync it appears to work. I will see after initial
> > sync is complete, so far I didn´t see any collection I disabled full text
> > indexing for in AkonadiConsole index agent status display.
> >
> > Next I will look into creating a review request with Phabricator. I do have
> > an identity account with commit rights, but I didn´t use Phabricator so far.
> > I hope its easy to make a review request with it.
>
> If you prefer ReviewBoard, just use ReviewBoard.
I think I want to learn the tools the kdepim people agreed upon, so I will
look into Phabricator. Hope its documented somewhere. Hmmm, seems some
documentation is at
https://techbase.kde.org/Development/Phabricator
>
> > I also wonder about dirty collection, somewhere in scheduler.cpp it writes
> > them out to a config file, I bet it shouldn´t add collections in there which
> > are not enabled for fulltext indexing.
>
> Right, but then again we filter those out in scheduleCollection(), so no
> problem there.
>
> >
> > I know its not a review request, but this is what I changed. And so far this
> > first hack appears to do what I want :)
> >
> > diff --git a/agent/scheduler.cpp b/agent/scheduler.cpp
> > index c912468..42828de 100644
> > --- a/agent/scheduler.cpp
> > +++ b/agent/scheduler.cpp
> > @@ -26,6 +26,7 @@
> > #include <AkonadiCore/CollectionFetchScope>
> > #include <AkonadiAgentBase/AgentBase>
> > #include <AkonadiCore/ServerManager>
> > +#include <AkonadiCore/indexpolicyattribute.h>
> > #include <KLocalizedString>
> > #include <KConfigGroup>
> > #include <QTimer>
> > @@ -105,18 +106,30 @@ void Scheduler::collectDirtyCollections()
> >
> > void Scheduler::scheduleCollection(const Akonadi::Collection &col, bool
> > fullSync) {
> > - if (!m_collectionQueue.contains(col.id())) {
> > - m_collectionQueue.enqueue(col.id());
> > - }
> > - if (fullSync) {
> > - m_dirtyCollections.insert(col.id());
> > + //Only index the collection if user did not disable full text indexing
> > in folder properties dialog in KMail + Akonadi::IndexPolicyAttribute
> > *attr = col.attribute<Akonadi::IndexPolicyAttribute>(); + if (!attr ||
> > attr->indexingEnabled()) {
> > + if (!m_collectionQueue.contains(col.id())) {
> > + m_collectionQueue.enqueue(col.id());
> > + }
> > + if (fullSync) {
> > + m_dirtyCollections.insert(col.id());
> > + }
> > }
> > +
> > processNext();
> > }
>
> This looks OK.
It is not complete however, as I now found it indexing mesa-dev-ml and I
told it not to.
> > void Scheduler::addItem(const Akonadi::Item &item)
> > {
> > Q_ASSERT(item.parentCollection().isValid());
> > +
> > + //Only index the item if user did not disable full text indexing for
> > its collection in folder properties dialog in KMail
> > + Akonadi::IndexPolicyAttribute *attr =
> > item.parentCollection().attribute<Akonadi::IndexPolicyAttribute>();
>
> Unfortunately this won't work just like that, the information available about
> parent Collection is limited and it does not include attributes. It *might*
> work, if the Collection in some of the lower caches has the information, but
> you cannot rely on it.
>
> Since we cannot afford fetching full parent Collection for each Item I guess
> we will need some local cache of Collections (IDs) with disabled indexing and
> just quickly check if Item's belongs to that collection or not.
>
> The cache can easilly be kept up-to-date from the
> collection{Added,Changed,Removed} handlers, just need to make sure it's
> populated on start.
Hmmmm, sounds a bit more complicated then.
>
> > + if (!attr || !attr->indexingEnabled()) {
>
> Should be
>
> if (attr && !attr->indexingEnabled())
I now have:
diff --git a/agent/scheduler.cpp b/agent/scheduler.cpp
index c912468..cf11f04 100644
--- a/agent/scheduler.cpp
+++ b/agent/scheduler.cpp
@@ -26,6 +26,7 @@
#include <AkonadiCore/CollectionFetchScope>
#include <AkonadiAgentBase/AgentBase>
#include <AkonadiCore/ServerManager>
+#include <AkonadiCore/indexpolicyattribute.h>
#include <KLocalizedString>
#include <KConfigGroup>
#include <QTimer>
@@ -105,12 +106,17 @@ void Scheduler::collectDirtyCollections()
void Scheduler::scheduleCollection(const Akonadi::Collection &col, bool fullSync)
{
- if (!m_collectionQueue.contains(col.id())) {
- m_collectionQueue.enqueue(col.id());
- }
- if (fullSync) {
- m_dirtyCollections.insert(col.id());
+ //Only index the collection if user did not disable full text indexing in folder properties dialog in KMail
+ Akonadi::IndexPolicyAttribute *attr = col.attribute<Akonadi::IndexPolicyAttribute>();
+ if (!attr || attr->indexingEnabled()) {
+ if (!m_collectionQueue.contains(col.id())) {
+ m_collectionQueue.enqueue(col.id());
+ }
+ if (fullSync) {
+ m_dirtyCollections.insert(col.id());
+ }
}
+
processNext();
}
@@ -218,6 +224,13 @@ void Scheduler::processNext()
}
const Akonadi::Collection col(m_collectionQueue.takeFirst());
+
+ //Only index the collection if user did not disable full text indexing in folder properties dialog in KMail
+ Akonadi::IndexPolicyAttribute *attr = Akonadi::Collection(col).attribute<Akonadi::IndexPolicyAttribute>();
+ if (attr && !attr->indexingEnabled()) {
+ return;
+ }
+
qCDebug(AKONADI_INDEXER_AGENT_LOG) << "Processing collection: " << col.id();
QQueue<Akonadi::Item::Id> &itemQueue = m_queues[col.id()];
const bool fullSync = m_dirtyCollections.contains(col.id());
Yet it again still indexes collections it should not.
Seems I need to look at it more closely.
Ideally the decision whether a collection or an item is indexes or removed
from the index would be made at *one* place in the code. And all other places
call into that.
Also when disabling full text indexing for a collection should it remove
it from the index or say "well we indexed it already, so why not keep it?".
I vote for removing it as that is what I would expect.
Thanks,
--
Martin
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list