D11828: Simplify orPostingIterator and make it faster

Stefan BrĂ¼ns noreply at phabricator.kde.org
Mon Apr 9 17:06:06 UTC 2018


bruns added a comment.


  In D11828#242402 <https://phabricator.kde.org/D11828#242402>, @michaelh wrote:
  
  > I'm not very familiar with the concept of iterators (yet). To me it looks like `auto i = new OrPostingIterator(iters); i->docId();` will return 0 and `i->next()` returns a valid docId. After that `i->docId();` is also valid.
  >  Is this how iterators work? Naively I would expect `i->docId();` to be valid after construction or both `i->docId();` and `i->next()` to be invalid.
  >
  > I have to admit, it's very difficult to understand (and I don't, really) what baloo is doing here with all this iterating over vectors of inherited iterator classes, pointer-to-pointer stuff of which some might not even exist. My brain hurts, terribly! All I can presently do is ask some simple questions. Sorry, I'm of no help here. Please take my comments as a device for better understanding and I hope it's not to much of a bother.
  
  
  Each PostingIterator represents a "set" of document ids. PostingIterators are not related to STL or Qt container iterators (although the "subiterators" are implemented using QVectors<>).
  
  The `PI::next()` call returns the smallest element and removes it from the set, `PI::docId()` just returns the smallest element.
  
  The result set of the OrPostingIterator is the union of its "subsets" (represented by its "subiterators", i.e. the QVector<PostingIterator*> constructor argument). Or works a little bit like an insertion sort. To return the smallest element, it creates the union of the smallest element of all subsets, and yields the smallest element of the union. It then calls `PI::next()` on all subsets which returned the smallest element, removing it from all subsets and thus from the union.
  
  In C++, you can use iterators in two different ways, either PreIncrement or PostIncrement. `PostingIterator::next()` will behave like `*(++it)`, i.e. it is incremented first, then the current value is returned.
  
  I don't know the original justification for this design. One possible idea is to delay the actual database query until `next()` is called the first time. For e.g. a query "a AND b AND c", one can execute "a", if not empty do "b", intersect (i.e. calling the Iterators of Result("a") and Result("b") until both return the same value), and if the intersection is non-empty, do "c".
  
  Currently, it does not implement this optimization. D12025 <https://phabricator.kde.org/D12025> implements one case, i.e. stopping if a leaf query returns an empty set, and propagating it upwards.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11828

To: bruns, #baloo, michaelh
Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, alexeymin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20180409/c55a54f6/attachment.html>


More information about the Kde-frameworks-devel mailing list