<html>

 <body>

  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">

   <table bgcolor="#f9f3c9" width="100%" cellpadding="12" style="border: 1px #c9c399 solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">

    <tr>

     <td>

      This is an automatically generated e-mail. To reply, visit:

      <a href="https://git.reviewboard.kde.org/r/124539/">https://git.reviewboard.kde.org/r/124539/</a>

     </td>

    </tr>

   </table>

   <br />

<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">

 <p style="margin-top: 0;">On August 1st, 2015, 6:36 a.m. UTC, <b>Olivier Jean de Gaalon</b> wrote:</p>

 <blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">

  <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">(actually looking at the code)</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">qHash for integral types is just a cast to uint, maybe would be clearer to do that directly.</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Often you end up having to balance the cost of hashing vs the cost of collisions, so a cheaper hash isn't in and of itself a win.</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">My benchmarking was focused on reducing collisions, so in retrospect it's likely the current implementation is sacrificing real performance. Possibly we shouldn't be taking byte-sized chunks of /any/ data type size (though then padding doesn't help the hash much).</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">OTOH, benching the hash seems like an exercise in futility: the best hash by this measurement will be "return 0". The cost of any other hash is pure waste unless it's improving the overall performance of the system via reduced collisions. Changes to the hash should then be ignoring these benchmarks and testing if real insertion and lookup patterns are improved... unfortunately benchmarking that is quite a bit more work.</p></pre>

 </blockquote>

</blockquote>

<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;"><p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">I looked at std::hash, and that one also uses the value directly. I rather use qHash or std::hash as it documents clearly what I want it to be, and if they ever change the implementation it's automagically picked up by us then as well.</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">And I looked at this code as I've seen it as a hotspots a couple of times. A hash function should be fast, but of course you are right in saying that the collision reduction is even more important. But can you explain why using the integral value directly would decrease the randomness (or however one calls it) of the hash function? Especially looking at our current code:</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;"><div class="codehilite" style="background: #f8f8f8"><pre style="line-height: 125%">// any integral type gets interpreted as const char*

static unsigned int hash(const char* const data, const unsigned int size, unsigned int seed = DEFAULT_SEED)

{

    // 4 iterations for (u) int, 8 for q(u)int64

    for (unsigned int i = 0; i < size; ++i) {

        // data[i] gets casted to a (u)int again

        seed += data[i];

        seed += ( seed << 10 );

        seed ^= ( seed >> 6 );

    }

    return seed;

}

</pre></div>

</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">wouldn't we achieve a similar result, but much faster, if we simply do</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;"><div class="codehilite" style="background: #f8f8f8"><pre style="line-height: 125%">// any integral type gets interpreted as const char*

static unsigned int hash(uint hash, unsigned int seed = DEFAULT_SEED)

{

    seed += hash;

    seed += ( seed << 10 );

    seed ^= ( seed >> 6 );

    return seed;

}

</pre></div>

</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Then, when we keep in mind that DEFAULT_SEED is 2166136261u, we are extremely close to what boost::hash_combine does internally:</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;"><div class="codehilite" style="background: #f8f8f8"><pre style="line-height: 125%">seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);

</pre></div>

</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">Thus I propose I refactor KDev::hash to operate on boost::hash_combine until we get http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf . I cannot think that the boost code is worse than the KDevHash code, quite the contrary.</p>

<p style="padding: 0;text-rendering: inherit;margin: 0;line-height: inherit;white-space: inherit;">When we do follow this approach, I also want to remove the implicit hash function that converts any T to <code style="text-rendering: inherit;color: #4444cc;padding: 0;white-space: normal;margin: 0;line-height: inherit;">const char*</code> and hashes that. Rather, I want people (i.e. me - in the first step) to write custom qHash or std::hash functions for their types.</p></pre>

<br />

<p>- Milian</p>

<br />

<p>On July 30th, 2015, 10:04 p.m. UTC, Milian Wolff wrote:</p>

<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="12" style="border: 1px #888a85 solid; border-radius: 6px; -moz-border-radius: 6px; -webkit-border-radius: 6px;">

 <tr>

  <td>

<div>Review request for KDevelop.</div>

<div>By Milian Wolff.</div>

<p style="color: grey;"><i>Updated July 30, 2015, 10:04 p.m.</i></p>

<div style="margin-top: 1.5em;">

 <b style="color: #575012; font-size: 10pt;">Repository: </b>

kdevplatform

</div>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>

 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">

 <tr>

  <td>

   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Optimize KDevHash for integral types.

For integral types, we now reuse qHash internally and only

fall back to the O(sizeof(T)) implementation for other

types.

Before:

********* Start testing of TestKDevHash *********

Config: Using QtTest library 5.5.0, Qt 5.5.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.1.0)

PASS   : TestKDevHash::initTestCase()

PASS   : TestKDevHash::benchHash_int()

RESULT : TestKDevHash::benchHash_int():

     0.075 msecs per iteration (total: 77, iterations: 1024)

PASS   : TestKDevHash::benchHash_uint()

RESULT : TestKDevHash::benchHash_uint():

     0.075 msecs per iteration (total: 77, iterations: 1024)

PASS   : TestKDevHash::benchHash_quint64()

RESULT : TestKDevHash::benchHash_quint64():

     0.15 msecs per iteration (total: 77, iterations: 512)

PASS   : TestKDevHash::benchHash_bool()

RESULT : TestKDevHash::benchHash_bool():

     0.018 msecs per iteration (total: 77, iterations: 4096)

PASS   : TestKDevHash::cleanupTestCase()

Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted

********* Finished testing of TestKDevHash *********

After:

********* Start testing of TestKDevHash *********

Config: Using QtTest library 5.5.0, Qt 5.5.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 5.1.0)

PASS   : TestKDevHash::initTestCase()

PASS   : TestKDevHash::benchHash_int()

RESULT : TestKDevHash::benchHash_int():

     0.024 msecs per iteration (total: 51, iterations: 2048)

PASS   : TestKDevHash::benchHash_uint()

RESULT : TestKDevHash::benchHash_uint():

     0.024 msecs per iteration (total: 51, iterations: 2048)

PASS   : TestKDevHash::benchHash_quint64()

RESULT : TestKDevHash::benchHash_quint64():

     0.026 msecs per iteration (total: 54, iterations: 2048)

PASS   : TestKDevHash::benchHash_bool()

RESULT : TestKDevHash::benchHash_bool():

     0.022 msecs per iteration (total: 92, iterations: 4096)

PASS   : TestKDevHash::cleanupTestCase()

Totals: 6 passed, 0 failed, 0 skipped, 0 blacklisted

********* Finished testing of TestKDevHash *********</pre>

  </td>

 </tr>

</table>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>

<ul style="margin-left: 3em; padding-left: 0;">

 <li>language/CMakeLists.txt <span style="color: grey">(2c2b0285b68d209e07af185c43dcf157507f5a54)</span></li>

 <li>language/util/kdevhash.h <span style="color: grey">(40c32935aec9c7e5552769e780c033e96cf63166)</span></li>

 <li>language/util/tests/CMakeLists.txt <span style="color: grey">(PRE-CREATION)</span></li>

 <li>language/util/tests/test_kdevhash.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>plugins/projectfilter/projectfilter.h <span style="color: grey">(014e5a1eabbb3a2d9e71baeee3e38643cb8649e7)</span></li>

</ul>

<p><a href="https://git.reviewboard.kde.org/r/124539/diff/" style="margin-left: 3em;">View Diff</a></p>

  </td>

 </tr>

</table>

  </div>

 </body>

</html>