Nonportable code in C++ support

Sat Feb 16 09:08:38 UTC 2008

On Saturday 16 February 2008 02:22:34 Matthew Woehlke wrote:

> > A hash function can be called very often. Just the other day, a 70s
> > optimisation program managed to call hash<T*> 250 million times. If the
> > result is improved, it might be worth it using the 250 million cycles (I
> > mean, what is that these days?), but in this case it is not. It just
> > means more complicated code with worse performance for no real gain.
>
> Ok... that would make a shift here *almost* a 0.2% performance hit on a
> typical recent machine. I can't say I'm terrifically worried about that.

I agree it is nitpicking. But it is still waste of clockcycles, and added 
complexity.

>
> However, you failed to take something into account. You don't just call
> hash(foo), because to get the hash table index you actually have to call
> reduce(hash(foo)). The reduce function matters... a *lot*.

Indeed it does. I usually don't write those, unlike hash functions.

>
> In order for a no-op pointer hash function to not suck, you /must/ use
> the modulo-2**n-1 reduce() function you mentioned, but idivl is hugely
> more expensive than andl, even after the more complicated hash function
> needed by a modulo-2**n hash table. A modulo-2**n hash table (where
> reduce() == andl) with a shrl hash function should neatly outperform* a
> modulo-2**n-1 hash table (reduce() == movl+cltd+idivl) with a no-op hash
> function.

True. However, most hash tables use the div anyway, for various reasons. In 
fact, looking at the std::tr1 implementation again, they actually go to the 
trouble of making the hash table grow by prime numbers. So before embarking 
on this route, I'd check whether the implementation you use divide by some 
odd number anyway.

Of course, if I wrote a specialized hash table for just pointers, I'd go for 
the function we mentioned (reinterpret_cast<int>(p)>>sizeof(size_t)).

>
> (* Did I say "outperform"? I meant "kick the pants off"... testing 2**30
> iterations of shrl+and vs movl+cltd+idivl in a loopl on a Xeon, the
> latter was around 20x slower. The times were 0m1.118s vs 0m27.393s.
> Sources attached.)

Sounds reasonable.

-- 
regards, Esben

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.