[KPhotoAlbum] Search performance (at least) linear in number of search terms?

Robert Krawitz rlk at alum.mit.edu
Sun Jul 16 16:55:22 BST 2017


On Sun, 16 Jul 2017 09:53:37 +0200, Tobias Leupold wrote:
> Hi :-)
>
> As far as I can grasp it, Set.h and Set.cpp is a remnant of KPA using older Qt versions which did not provide the needed functionality.
>
> In case of doubt, Johannes is the one to decide whether or not, but I think this can be removed in favor of using Qt directly. Which does not only – as you stated – speed up the code, but also simplify it.

So, here's the big problem: it's just plain doing waaaaay too much
work, by at least an order of magnitude.  Not to mention that the
number of calls to hasCategoryInfo() is linear in the number of
individual search terms (so each keyword being searched for is checked
twice), which since the intersection test is likely superlinear
results in the overall work being superquadratic.  *Then* this is all
done almost 10 times for each search.

I set a breakpoint on DB::ImageSearchInfo::match on the demo database,
and caught a stack trace for each time it was hit on one particular
image.  I found that that was being called 9x (and in some cases that
I haven't characterized 18x) for each image.  And what's more, _both_
flavors of hasCategoryInfo() (in ImageInfo.cpp) are being called if
there's _not_ a match on a keyword (if there is a match, the call to
the QSet<QString> flavor is short circuited).

If I do a search for "desktop | scenic" in keywords, I get this trace
using debugging information below:

match "bar55.jpg"
hCI1 "bar55.jpg" "Keywords" "desktop" ""
hCI2 "bar55.jpg" "Keywords" "" ""
hCI1 "bar55.jpg" "Keywords" "fun" ""
hCI2 "bar55.jpg" "Keywords" "" ""
match "snow.jpg"
hCI1 "snow.jpg" "Keywords" "desktop" "|desktop"

whereas if I just search for "desktop" I get this:

match "bar55.jpg"
hCI1 "bar55.jpg" "Keywords" "desktop" ""
hCI2 "bar55.jpg" "Keywords" "" ""
match "snow.jpg"
hCI1 "snow.jpg" "Keywords" "desktop" "|desktop"

and if I search for all 5 keywords, I get -- no surprise -- this:

match "bar55.jpg"
hCI1 "bar55.jpg" "Keywords" "desktop" ""
hCI2 "bar55.jpg" "Keywords" "" ""
hCI1 "bar55.jpg" "Keywords" "fun" ""
hCI2 "bar55.jpg" "Keywords" "" ""
hCI1 "bar55.jpg" "Keywords" "new wave" ""
hCI2 "bar55.jpg" "Keywords" "" ""
hCI1 "bar55.jpg" "Keywords" "scanned in" ""
hCI2 "bar55.jpg" "Keywords" "" ""
hCI1 "bar55.jpg" "Keywords" "scenic" ""
hCI2 "bar55.jpg" "Keywords" "" ""
match "snow.jpg"
hCI1 "snow.jpg" "Keywords" "desktop" "|desktop"

I don't have time right now to rearchitect this, and it needs
discussion.  For example, perhaps we could use a generation number
that's incremented each time the search conditions are changed, and
short circuit all this work otherwise.  Or find some other way to
ensure that it's only called once.

Oh, and finally, ImageSearchInfo::match() doesn't completely short
circuit match failures.  For examplee, it still iterates over the
categories in the options search.  Maybe the compiler is clever and
optimizes that out, but I wouldn't want to bet on that.

diff --git a/DB/ImageSearchInfo.cpp b/DB/ImageSearchInfo.cpp
index 8d11a5c6..974edbd2 100644
--- a/DB/ImageSearchInfo.cpp
+++ b/DB/ImageSearchInfo.cpp
@@ -86,8 +86,8 @@ bool ImageSearchInfo::match( ImageInfoPtr info ) const
     if ( !m_compiled )
         compile();
 
-    bool ok = true;
-    ok = m_exifSearchInfo.matches( info->fileName() );
+    if (! m_exifSearchInfo.matches( info->fileName() ))
+	return false;
 
     QDateTime actualStart = info->date().start();
     QDateTime actualEnd = info->date().end();
@@ -108,27 +108,28 @@ bool ImageSearchInfo::match( ImageInfoPtr info ) const
         bool b2 =( actualStart <= m_date.end() && m_date.end() <= actualEnd );
         bool b3 = ( m_date.start() <= actualStart && ( actualEnd <= m_date.end() || m_date.end().isNull() ) );
 
-        ok = ok && ( ( b1 || b2 || b3 ) );
+	if (! ( ( b1 || b2 || b3 ) ) ) return false;
     } else if ( !m_date.end().isNull() ) {
         bool b1 = ( actualStart <= m_date.end() && m_date.end() <= actualEnd );
         bool b2 = ( actualEnd <= m_date.end() );
-        ok = ok && ( ( b1 || b2 ) );
+	if (! ( ( b1 || b2 ) ) ) return false;
     }
 
     // -------------------------------------------------- Options
     // alreadyMatched map is used to make it possible to search for
     // Jesper & None
+    qDebug() << "match" << info->fileName().relative();
     QMap<QString, StringSet> alreadyMatched;
     for (CategoryMatcher* optionMatcher : m_categoryMatchers) {
-        ok = ok && optionMatcher->eval(info, alreadyMatched);
+	if (! optionMatcher->eval(info, alreadyMatched) ) return false;
     }
 
 
     // -------------------------------------------------- Label
-    ok = ok && ( m_label.isEmpty() || info->label().indexOf(m_label) != -1 );
+    if (! ( m_label.isEmpty() || info->label().indexOf(m_label) != -1 ) ) return false;
 
     // -------------------------------------------------- RAW
-    ok = ok && ( m_searchRAW == false || ImageManager::RAWImageDecoder::isRAW( info->fileName()) );
+    if (! ( m_searchRAW == false || ImageManager::RAWImageDecoder::isRAW( info->fileName()) ) ) return false;
 
     // -------------------------------------------------- Rating
 
@@ -137,61 +138,55 @@ bool ImageSearchInfo::match( ImageInfoPtr info ) const
     switch( m_ratingSearchMode ) {
         case 1:
         // Image rating at least selected
-        ok = ok && ( m_rating <= info->rating() );
+	if (! ( m_rating <= info->rating() ) ) return false;
         break;
         case 2:
         // Image rating less than selected
-        ok = ok && ( m_rating >= info->rating() );
+	if (! ( m_rating >= info->rating() ) ) return false;
         break;
         case 3:
         // Image rating not equal
-        ok = ok && ( m_rating != info->rating() );
+	if (! ( m_rating != info->rating() ) ) return false;
         break;
         default:
-            ok = ok && ((m_rating == -1 ) || ( m_rating == info->rating() ));
+	if (! ((m_rating == -1 ) || ( m_rating == info->rating() ))) return false;
         break;
     }
-    }
 
 
     // -------------------------------------------------- Resolution
-    if ( m_megapixel )
-        ok = ok && ( m_megapixel * 1000000 <= info->size().width() * info->size().height() );
+    if ( m_megapixel && (! ( m_megapixel * 1000000 <= info->size().width() * info->size().height() ) ) ) return false;
 
-    if ( m_max_megapixel && m_max_megapixel > m_megapixel )
-        ok = ok && ( m_max_megapixel * 1000000 > info->size().width() * info->size().height() );
+    if ( ( m_max_megapixel && m_max_megapixel > m_megapixel ) && (! ( m_max_megapixel * 1000000 > info->size().width() * info->size().height() ) ) ) return false;
 
     // -------------------------------------------------- Text
     QString txt = info->description();
     if ( !m_description.isEmpty() ) {
         QStringList list = m_description.split(QChar::fromLatin1(' '), QString::SkipEmptyParts);
         Q_FOREACH( const QString &word, list ) {
-            ok = ok && ( txt.indexOf( word, 0, Qt::CaseInsensitive ) != -1 );
+	    if (! ( txt.indexOf( word, 0, Qt::CaseInsensitive ) != -1 ) ) return false;
         }
     }
 
     // -------------------------------------------------- File name pattern
-    ok = ok && ( m_fnPattern.isEmpty() ||
-        m_fnPattern.indexIn( info->fileName().relative() ) != -1 );
+    if (! ( m_fnPattern.isEmpty() ||
+	    m_fnPattern.indexIn( info->fileName().relative() ) != -1 ) ) return false;
 
 
 #ifdef HAVE_KGEOMAP
     // Search for GPS Position
-    if (ok && m_usingRegionSelection) {
-        ok = ok && info->coordinates().hasCoordinates();
-        if (ok) {
-            float infoLat = info->coordinates().lat();
-            float infoLon = info->coordinates().lon();
-            ok = ok
-                 && m_regionSelectionMinLat <= infoLat
-                 && infoLat                 <= m_regionSelectionMaxLat
-                 && m_regionSelectionMinLon <= infoLon
-                 && infoLon                 <= m_regionSelectionMaxLon;
+    if (m_usingRegionSelection) {
+	if (! info->coordinates().hasCoordinates() ) return false;
+	float infoLat = info->coordinates().lat();
+	float infoLon = info->coordinates().lon();
+	if (! (m_regionSelectionMinLat <= infoLat
+	       && infoLat                 <= m_regionSelectionMaxLat
+	       && m_regionSelectionMinLon <= infoLon
+	       && infoLon                 <= m_regionSelectionMaxLon ) ) return false;
         }
     }
 #endif
-
-    return ok;
+    return true;
 }
 
 

> E-Mail vom 15.07.2017, 23:34 von Robert Krawitz:
>> Well, I did find one piece of really low hanging fruit: the overlap
>> function (which, mind you, is used in only one place, so Set.cpp and
>> Set.h might just as well be removed altogether) was computing the
>> intersection of the two sets of strings rather than just determining
>> whether there is an intersection.
>> 
>> For my test case, it cut the time from about 18 seconds to maybe 13.
>> While with only 222,000 images and searching for about 40 tags it
>> should be able to do a lot better indeed than that, this is at least a
>> start in the right direction.
>> 
>> If you want, I'll do a patch that removes Set.h and Set.cpp altogether.
>> 
>> diff --git a/DB/ImageInfo.cpp b/DB/ImageInfo.cpp
>> index d5b8cb25..c5897fe7 100644
>> --- a/DB/ImageInfo.cpp
>> +++ b/DB/ImageInfo.cpp
>> @@ -132,7 +132,7 @@ bool ImageInfo::hasCategoryInfo( const QString& key,
>> const QString& value ) cons
>> 
>>  bool DB::ImageInfo::hasCategoryInfo( const QString& key, const StringSet&
>> values ) const {
>> -    return Utilities::overlap( m_categoryInfomation[key], values );
>> +    return values.intersects( m_categoryInfomation[key] );
>>  }


The hasCategoryInfo() with debugging:

bool ImageInfo::hasCategoryInfo( const QString& key, const QString& value ) const
{
  QString answer;
  Q_FOREACH(QString str, m_categoryInfomation[key]) {
    answer = answer + QString::fromLatin1("|");
    answer = answer + str;
  }
  qDebug() << "hCI1" << m_fileName.relative() << key << value << answer;
    return m_categoryInfomation[key].contains(value);
}

bool DB::ImageInfo::hasCategoryInfo( const QString& key, const StringSet& values ) const
{
  QString answer, answer1;
  Q_FOREACH(QString str, m_categoryInfomation[key]) {
    answer = answer + QString::fromLatin1("|");
    answer = answer + str;
  }
  Q_FOREACH(QString str, values) {
    answer1 = answer1 + QString::fromLatin1("|");
    answer1 = answer1 + str;
  }
  qDebug() << "hCI2" << m_fileName.relative() << key << answer1 << answer;
    return values.intersects( m_categoryInfomation[key] );
}


All of the stack traces:

#7  0x00007f2a8137f493 in QMetaObject::activate(QObject*, int, int, void**) ()
    at /usr/lib64/libQt5Core.so.5
#8  0x00007f2a84bc0c25 in QAbstractItemView::activated(QModelIndex const&) ()
    at /usr/lib64/libQt5Widgets.so.5
#9  0x00007f2a84bc6ef6 in QAbstractItemView::mouseReleaseEvent(QMouseEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#10 0x00007f2a84beb38e in QListView::mouseReleaseEvent(QMouseEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#11 0x00007f2a849d1487 in QWidget::event(QEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#12 0x00007f2a84aaddde in QFrame::event(QEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#13 0x00007f2a84bcdcfb in QAbstractItemView::viewportEvent(QEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#14 0x00007f2a81355801 in QCoreApplicationPrivate::sendThroughObjectEventFilters(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#15 0x00007f2a84993ad5 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#16 0x00007f2a8499aedc in QApplication::notify(QObject*, QEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#17 0x00007f2a81355935 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#18 0x00007f2a84999d59 in QApplicationPrivate::sendMouseEvent(QWidget*, QMouseEvent*, QWidget*, QWidget*, QWidget**, QPointer<QWidget>&, bool) ()
    at /usr/lib64/libQt5Widgets.so.5
#19 0x00007f2a849e9b91 in  () at /usr/lib64/libQt5Widgets.so.5
#20 0x00007f2a849ec0f3 in  () at /usr/lib64/libQt5Widgets.so.5
#21 0x00007f2a84993afc in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /usr/lib64/libQt5Widgets.so.5
#22 0x00007f2a8499a840 in QApplication::notify(QObject*, QEvent*) ()
    at /usr/lib64/libQt5Widgets.so.5
#23 0x00007f2a81355935 in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /usr/lib64/libQt5Core.so.5
#24 0x00007f2a819184ed in QGuiApplicationPrivate::processMouseEvent(QWindowSystemInterfacePrivate::MouseEvent*) () at /usr/lib64/libQt5Gui.so.5
#25 0x00007f2a8191a0a5 in QGuiApplicationPrivate::processWindowSystemEvent(QWindowSystemInterfacePrivate::WindowSystemEvent*) () at /usr/lib64/libQt5Gui.so.5
#26 0x00007f2a818f88ab in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Gui.so.5
#27 0x00007f2a70016e50 in  () at /usr/lib64/libQt5XcbQpa.so.5
#28 0x00007f2a7b501134 in g_main_context_dispatch ()
    at /usr/lib64/libglib-2.0.so.0
#29 0x00007f2a7b501388 in  () at /usr/lib64/libglib-2.0.so.0
#30 0x00007f2a7b50142c in g_main_context_iteration ()
    at /usr/lib64/libglib-2.0.so.0
#31 0x00007f2a813a611c in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#32 0x00007f2a81353c2b in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib64/libQt5Core.so.5
#33 0x00007f2a8135c1f4 in QCoreApplication::exec() ()
    at /usr/lib64/libQt5Core.so.5
#34 0x0000000000460307 in main ()
-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton



More information about the Kphotoalbum mailing list