Nonportable code in C++ support

Sat Feb 16 15:34:16 UTC 2008

On Friday 15 February 2008, Matthew Woehlke wrote:
> Esben Mose Hansen wrote:
> > On Friday 15 February 2008 20:14:15 Matthew Woehlke wrote:
> >> Isn't this a *really* good reason to strip the last couple bits? I.e.
> >> because otherwise with a small table you will *always* get collisions
> >> due to the last 2-4 bits always being 0?
> >
> > Most hashtable implementations uses a modulo by some number of the form
> > 2^N-1 or similar, you would not get increased collisions. This why e.g.
> > the upcoming implementation of the hash table in the C++ std library just
> > uses a straight reinterpret_cast.
>
> Hmm, well... a particular algorithm I'm familiar with doesn't :-). And
> for good reason (keep reading...).
>
> >> Seriously, how long does a bit-shift operation take? It's measured in
> >> single-digit /cycles/ after all... It's measurable compared to the cost
> >> of calling the hash function, but not compared to much else.
> >
> > A hash function can be called very often. Just the other day, a 70s
> > optimisation program managed to call hash<T*> 250 million times. If the
> > result is improved, it might be worth it using the 250 million cycles (I
> > mean, what is that these days?), but in this case it is not. It just
> > means more complicated code with worse performance for no real gain.
>
> Ok... that would make a shift here *almost* a 0.2% performance hit on a
> typical recent machine. I can't say I'm terrifically worried about that.
>
> However, you failed to take something into account. You don't just call
> hash(foo), because to get the hash table index you actually have to call
> reduce(hash(foo)). The reduce function matters... a *lot*.
>
> In order for a no-op pointer hash function to not suck, you /must/ use
> the modulo-2**n-1 reduce() function you mentioned, but idivl is hugely
> more expensive than andl, even after the more complicated hash function
> needed by a modulo-2**n hash table. A modulo-2**n hash table (where
> reduce() == andl) with a shrl hash function should neatly outperform* a
> modulo-2**n-1 hash table (reduce() == movl+cltd+idivl) with a no-op hash
> function.
>
> (* Did I say "outperform"? I meant "kick the pants off"... testing 2**30
> iterations of shrl+and vs movl+cltd+idivl in a loopl on a Xeon, the
> latter was around 20x slower. The times were 0m1.118s vs 0m27.393s.
> Sources attached.)

Are those tight loops realistic? What about memory accesses etc? I presume 
that an idivl followed by a memory access will somewhat amortize the cost of 
the latter, and so on. I'd suggest trying to do an assembly implementation of 
the whole hash lookup, and run it with cold cache 
(even /bin/cat /usr/lib/libc.a > /dev/null should do it).

Besides, to get remainder off a constant division you can always do two 
multiplies and a subtract. I don't know if that's any faster, but it may be 
worth trying. Especially if the multiplies can be done in parallel (for 
current and previous reduces) using SIMD operations.

With integer operations, on an N bit machine,

a mod const b :=
  cd = a * const (1 << N) / b;
  e = b * c;
  a - e;

Cheers, Kuba
-------------- next part --------------
A non-text attachment was scrubbed...
Name: divvsmul.cpp
Type: text/x-c++src
Size: 513 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20080216/a3cde6ad/attachment.cpp>