[Kde-pim] Problem with bulk fetching of items with 4.8

Shaheed Haque srhaque at theiet.org
Sat Feb 4 22:00:47 GMT 2012


Hi Kevin,

See below...

2012/2/4 Kevin Krammer <kevin.krammer at gmx.at>:
> Hi Shaheed,
>
> I am not entirely sure what problem you are trying to solve.
>
> My interpretation is that Exchange does not allow you to query for items that
> have been created/modified/deleted since the last sync.

Correct.

> Since that would mean there is no such thing as the concept "remote revision",
> why would you want to store one?
>
> Without the option of just getting the updates you will have two set of items,
> one from Akonadi and one from Exchange.
>
> Any item in E but not in A is new.
> Any item in E and in A is potentially modified.
> Any item not in E but in A has been deleted

Also correct.

> But as I said I think I don't understands the problem.

The piece you are missing is the amount of data, and the speed with
which it can be fetched. On my system, I can fetch about 500 items
every 50 seconds, and there are about 500k items to fetch, so a full
download takes ~50k seconds or about 14 hours. Both the number of
items, and the download time mean that I cannot realistically do the
usual thing of building two lists for E and A, and subtracting one
from the other.

Instead, I have this design in mind...

1. When I start fetching the collection, I will note the starting time
using a collection attribute to persist the information (in case of
needing to restart the machine).

2. I have an incremental fetch phase during which I fetch data in
batches (of 500 items). After each batch, I "bookmark" where I got to.
If I shutdown my machine, on restart, I resume the fetch using the
bookmark.

3. When I get to the end (I've never actually managed to get to that
point yet!), I hope to delete all the items with a creation date prior
to the recorded start time.

I hope that make sense? Anyway, it is the query for this last part
that I am stuck on - or some other better idea!

Thanks, Shaheed

> Cheers,
> Kevin
>
> P.S.: is this requirement for full list download a limitation in the context
> of contacts or does Exchange also require email clients to full list a folder
> every time?

Global Address List entries are different than pretty much anything
else; even the storage service is separate. Personal Contacts, email,
calendar items all share the mail message service, and have some
notion of a creation and/or modification date which I hope to be able
to use.

> On Saturday, 2012-02-04, Shaheed Haque wrote:
>> Hi Kevin,
>>
>> Yes, I've been looking into ItemSync, and as of earlier today, have
>> something based on it which seems to work as well as my pre-4.8 code.
>> In fact better, since resuming a download seems consistent now
>> (possibly as a result of the overall 4.8 improvements?). Anyway, I'm
>> now hopeful that if I wait the ~5 hours needed, I'll be able to fetch
>> all 466k contacts.
>>
>> So, that brings me to my final design problem...
>>
>> Exchange does not seem to have any way for me to track changes, so to
>> work out what obsolete records I need to delete, I was thinking of
>> using a timestamp (stored using a custom attribute on the Collection),
>> and then using a query based on the (creation) "datetime" column of
>> the pimitemtable to find the obsolete items. In SQL terms, I think I
>> need something like:
>>
>>     SELECT itemid FROM pimitemtable WHERE collectionid=<mycollection>
>> AND datetime LT <mycutoff>;
>>
>> I'd propose to use an ItemSearchJob to run the query...all I need is
>> to figure out how to write the sparql version of the above. (I looked
>> at the queries in the ContactSearchJob sources, but that didn't get me
>> too far :-)). So, any query experts out there?
>>
>> Thanks, Shaheed
>>
>> 2012/2/4 Kevin Krammer <kevin.krammer at gmx.at>:
>> > Hi Shaheed,
>> >
>> > sorry for not replying earlier.
>> >
>> > On Wednesday, 2012-02-01, Shaheed Haque wrote:
>> >> With the hints provided, I now have something reasonably functionally
>> >> clean in SVN. Unfortunately, the performance is a bit poor in the
>> >> sense that I can fetch 300 items from Exchange in around 1000 ms, but
>> >> it then takes me ~60,000 ms to store in Akonadi. Now, given that I'm
>> >> writing one item at a time to Akonadi, the obvious thought is that it
>> >> would be better to run all the writes within the context of a
>> >> transaction, and use a single commit() at the end (as per the model I
>> >> started this design with). However, I cannot make that work...
>> >>
>> >> Basically, that approach seemed to work in the early stages when I
>> >> simply used ItemCreateJob on each item, but (somewhat to my surprise),
>> >> I found that creating the same object twice (i.e. with the same
>> >> remoteId) resulted in two objects being created. So, I end up trying
>> >> to ItemFetchJob first and depending on the success or failure,
>> >> ItemModifyJob or ItemCreateJob.
>> >
>> > The maildir resource's retrieveitemsjob basically uses the same approach.
>> > However, I think it first fetches all current items and the compares to
>> > that job's result when processing the remote items.
>> >
>> >> At this point, I'm a bit unclear what options I have. I *think* I'd
>> >> like to be able to do whatever it is that ItemsRetrievedIncremental()
>> >> does...
>> >
>> > The ResourceBase's implementation for retrieve items methods use a
>> > special job class called ItemSync.
>> > It is more or less a transaction of item create, modify and delete jobs
>> > based on comparison with the result of prior item fetch.
>> > I think it is actually a public class.
>> >
>> > Maildir resources uses a different implementation because timestamp
>> > information on its backend (maildir files) allow it to only process a
>> > subset of backend items.
>> >
>> > Cheers,
>> > Kevin
>> >
>> > --
>> > Kevin Krammer, KDE developer, xdg-utils developer
>> > KDE user support, developer mentoring
>> >
>> >
>> > _______________________________________________
>> > KDE PIM mailing list kde-pim at kde.org
>> > https://mail.kde.org/mailman/listinfo/kde-pim
>> > KDE PIM home page at http://pim.kde.org/
>>
>> _______________________________________________
>> KDE PIM mailing list kde-pim at kde.org
>> https://mail.kde.org/mailman/listinfo/kde-pim
>> KDE PIM home page at http://pim.kde.org/
>
>
> --
> Kevin Krammer, KDE developer, xdg-utils developer
> KDE user support, developer mentoring
>
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list