<!DOCTYPE html>

<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Dear digiKam fans and users,</p>

    <p>I'm trying to "Find duplicates" on a collection with about

      300,000 images. I was able to scan the collection and "Update

      fingerprints" with sqlite, but it crashed during "find

      duplicates." Then, I moved from sqlite to MySQL and I'm waiting

      (right now) to see if "Find Duplicates" will complete. While I was

      waiting, I looked into the database and found the ImageHaarMatrix

      table. Upon seeing it, I put together <a

        href="https://github.com/kenberland/digikam-pgvector">this

        demonstration</a> of using vector search instead of comparing

      the Haar matrix for each image. Here is the benchmark's summary:<br>

      <br>

      <font face="monospace">--- Benchmark Summary ---<br>

        Runs: 5<br>

        <br>

        --- Individual Run Times ---<br>

        Run 1: MySQL: 8.8436s, PostgreSQL: 0.0765s<br>

        Run 2: MySQL: 8.9818s, PostgreSQL: 0.0666s<br>

        Run 3: MySQL: 8.9786s, PostgreSQL: 0.0713s<br>

        Run 4: MySQL: 8.7938s, PostgreSQL: 0.0658s<br>

        Run 5: MySQL: 9.1870s, PostgreSQL: 0.0636s<br>

        <br>

        --- Average Times ---<br>

        MySQL (simulated search): 8.9570 seconds<br>

        PostgreSQL (pgvector search): 0.0688 seconds<br>

        <br>

        Improvement Factor: 130.25x</font></p>

    <p>If finding duplicates crashes again, I'll probably create a

      script to remove them using the pgvector information.</p>

    <p>-KB<br>

    </p>

  </body>

</html>