Impasto discussion

Wed Jul 21 01:33:11 CEST 2010

JL VT wrote:
> On Mon, Jul 19, 2010 at 7:06 PM, Matthew Woehlke
> <mw_triad at users.sourceforge.net>  wrote:
>> JL VT wrote:
>>> I'd like to perform thread necromancy now that I have a worthy update
>>> to mention:
>>>
>>> http://pentalis.org/kritablog/?p=157
>>>
>>> I decided to follow the Phong Illumination Model to set the color of
>>> the bumps, and for now I'm only using the pixels above and below, and
>>> right and left of the reference pixel to calculate its normal vector
>>> (and therefore, indirectly, "inclination/tilt").
>>>
>>> The main references being, as the blog post highlights:
>>> http://www.mini.pw.edu.pl/~kotowski/Grafika/IlluminationModel/Index.html
>>> http://www.gamedev.net/columns/hardcore/cgbumpmapping/
>>>
>>> The source of the program is:
>>> http://pentalis.org/kritablog/wp-content/uploads/2010/07/bumpy_source.txt
>>
>>>         vector_x = [1, 0, height_array[x+1][y] - height_array[x-1][y]]
>>>         vector_y = [0, 1, height_array[x][y+1] - height_array[x][y-1]]
>>
>> How do you handle edge pixels?
>>
>> At first that seemed like an over-simplification, so I decided to write
>> my own version and work out the math.
>>
>> If we construct a four-quad mesh around the pixel being evaluated and
>> calculate the normal of the center vertex, we can derive a very simple
>> formula for interior pixels (which turns out to be equivalent to the
>> above formula), as well as see how to calculate normals for edge pixels
>> (simply omit the faces that don't exist). Like so:
>>
>> // Mesh looks like this:
>> //
>> // 1--2--3
>> // |  |  |
>> // 4--0--5
>> // |  |  |
>> // 6--7--8
>> //
>> // ha0 = ha[x,y]; - ha == height_array
>> // had2 = ha[x,y-1] - ha0;
>> // had4 = ha[x-1,y] - ha0;
>> // had5 = ha[x+1,y] - ha0;
>> // had7 = ha[x,y+1] - ha0;
>> // cross: [y1*z2 - z1*y2, z1*x2 - x1*z2, x1*y2 - y1*x2]
>> // n1 = (2 - 0) × (4 - 0) = [ 0, -1, had2] × [-1,  0, had4]
>> // n3 = (5 - 0) × (2 - 0) = [ 1,  0, had5] × [ 0, -1, had2]
>> // n6 = (7 - 0) × (5 - 0) = [ 0,  1, had7] × [ 1,  0, had5]
>> // n8 = (4 - 0) × (7 - 0) = [-1,  0, had4] × [ 0,  1, had7]
>> // n1 = [-had4, -had2, -1];
>> // n3 = [ had5, -had2, -1];
>> // n6 = [ had5,  had7, -1];
>> // n8 = [-had4,  had7, -1];
>> // n = n1 + n3 + n6 + n8;
>> // n = [had5 - had4, had7 - had2, -2]; - can drop the constant scalar
>> n = [ha[x+1,y] - ha[x-1,y], ha[x,y+1] - ha[x,y-1], -2];
>> n = n / linalg.norm(n);
>>
>> I believe if you write out the cross product from your version, you'll
>> see it works out the same (allowing for a constant scalar adjustment of
>> the Z value).
>
> Well, after giving your sample code a 10 minute look, it came to my
> attention that the formula you derived with your own reasoning is in
> fact the way to express the cross product of two perpendicular vectors
> of differing z value, as illustrated by this link:
> http://www.wolframalpha.com/input/?i=%281%2C+0%2C+z1%29+x+%280%2C+1%2C+z2%29

Always good to know I can still do basic algebra :-). You could also 
look it up on Wikipedia (which will explain why some of those have -'s 
in front of the terms; it has to do with whether the X or Y vectors is 
on the left of the ×, which depends on the winding order - see 'right 
hand rule' - so that you get the vector pointing in the correct direction).

> The scalar adjustment may be a bit contentious, I'd have to look into
> it to be really sure that scalar adjustment should be there.

My thought regarding fiddling with the Z magnitude is, do we really want 
to say that a difference of 1 represents the same distance in Z as 1 
pixel in X or Y? This seems wrong, given it means you can only have a 
slope of 45° or more, especially if (as I would still strongly suggest) 
the impasto channel is 16-bit. Letting the user adjust the height 
scaling seems, well, obvious :-).

By the way, the denormalized normal is:
n = [ha[x+1,y] - ha[x-1,y], ha[x,y+1] - ha[x,y-1], -2*k];

...where 'k' is the length (x/y) of a pixel compared to the height 
difference represented by a heightfield delta of 1. Take the reciprocal 
(1/k) to specify the height difference represented by a heightfield 
delta of 1 as a fraction of the 'width/height' of 1 pixel. So (assuming 
my math is right) the gamedev example actually assumes that a 45° slope 
on a 200px texture would be represented by a value change in the 
heightfield of only 100!

Oh, and btw this all has been assuming square pixels :-). Somewhere in 
there (not shown) is the pixel aspect ratio, if Krita cares about such 
things.

> It calculates the cross product of both vectors and then normalizes
> them, in a single function. I believe the function is as optimized as
> it can be so I decided to not venture finding alternative ways to work
> out the normal at every pixel, because the most correct
> (mathematically: elegant and minimal) way to do it is the cross
> product vector of the plane described by vector_x and vector_y
> (tangent to the height map pixel), and this function literally
> calculates both and then normalizes the vector, which is required for
> the other calculations to work quick and seamlessly.

The most correct way is to calculate the four denormalized face normals 
about the pixel in question, sum them, and normalize the result. That's 
what you would be doing if you were talking about a heightfield mesh 
(i.e. an object created in CAD or other 3D modeling software from a 
heightfield bitmap).

It so happens that the result is *almost* the same, but you should know 
what the difference is, and understand the simplification of the correct 
algorithm so that you understand why the simplified algorithm is acceptable.

> Qt/C++ happened to perform 50+ times faster than the
> PyGame/Numpy/Python implementation (I blame Numpy for that, most of
> the overhead is in calculating the cross product, which maybe was what
> triggered your concern), opening a .bmp, calculating a 198x198
> bumpmap, and compressing it into a .png file and saving it, all after
> mere 43 ms in my computer (which is pretty average). The cross product
> calculation in its entirety took 13 ms (compared to 5.66 seconds in
> Python/Numpy).

Keep in mind that, ideally, you're trying to update this in real time. 
When you start talking about painting at 1:1 with a 100px diameter 
brush, 13ms /per update/ isn't insignificant. (That said, it's still the 
sqrt that's really killing you. In fact it might even be a worthwhile 
optimization to store the denormalized X,Y components - you don't even 
need to store the Z, since it is constant - as a normal map and offload 
the sqrt to the GPU in shader code, at least while painting, and defer 
storing the normalized normals to background processing after each 
stroke. Or maybe not; without being able to experiment with it, I can't 
say which is better.)

Better yet, use OpenCL and offload the whole thing to GPU :-).

> Certainly I haven't worked out the edge pixels, that's why the origin
> file is of size<width, height>  and the resulting image is of
> dimensions<width-2, height-2>. I haven't thought of a solution for
> the edge pixels yet, but I'm aware of the problem.

See above, I showed you exactly how to deal with edge pixels :-). Just 
reevaluate the expressions I gave, omitting the faces that aren't 
present at the edges.

Recalling:
> // 1--2--3
> // |  |  |
> // 4--0--5
> // |  |  |
> // 6--7--8
> //
> // ha0 = ha[x,y]; - ha == height_array
> // had2 = ha[x,y-1] - ha0;
> // had4 = ha[x-1,y] - ha0;
> // had5 = ha[x+1,y] - ha0;
> // had7 = ha[x,y+1] - ha0;
> // n1 = [-had4, -had2, -1];
> // n3 = [ had5, -had2, -1];
> // n6 = [ had5,  had7, -1];
> // n8 = [-had4,  had7, -1];

Then the normal for pixels on the top edge (y == 0) is:
// n = normalize(n6 + n8);
// n = normalize(vector(had5 - had4, 2*had7, -2));
vector n(ha[x+1][y] - ha[x-1][y], 2*(ha[x][y+1] - ha[x][y]), -2*k);
n.normalize();

See above for definition of 'k'. Doing the others is left as an exercise 
for the reader :-), though it should be pretty easy to see what they 
would be. Note that this assumes that 1 pixel (X/Y direction) equals a 
difference of 1 in the heightfield. If you use the gamedev code, you'll 
need to adjust for their 2 (x/y) == 1 (hf), i.e. use '-k' instead of '-2*k'.

> Well, I can't be certain that I'm saving math operations. Or maybe I
> am saving math operations but not processor operations and in the end
> things may take longer. I don't know how optimized the Qt function is
> but I bet it is very much so, because 13 ms for 40K normalized cross
> products is certainly very little overhead; so for now I'll try to not
> touch it unless it brings problems later.

I'd be quite surprised if you don't save a decent chunk of CPU 
instructions. QVector3D::crossProduct isn't optimized at all (don't 
guess, "use the source, Luke!"); certainly I don't see how the compiler 
would be able to optimize away the 'dead' terms in the multiplication. 
If you're /lucky/, all the ctors and function calls in normal(v1, v2, 
v3) have been inlined; if not then you have even more overhead from 
multiple function calls. And you'll also avoid the fuzzy length checks 
(branching is also expensive!) which aren't needed in this case.

It'll /look/ nicer, but don't kid yourself that it will be faster. The 
fastest you're likely to get (short of hand-optimized assembler, which I 
/do/ doubt would matter, never mind the portability headaches), is like so:

const qreal nz = -2/k; // k is Z scaling factor; put outside loop!
const qreal nzs = nz*nz; // also outside loop
qreal nx = ha[x+1][y] - ha[x-1][y];
qreal ny = ha[x][y+1] - ha[x][y-1];
qreal nnr = 1/qSqrt((nx*nx) + (ny*ny) + nzs);
// normalized normal is [nx*nnr, ny*nnr, nz*nnr]

You'll note I am avoiding *any* class construction until the last 
possible moment (that's assuming that you even need to store the normal 
as a class). Also there is no branching. The only expensive thing is the 
call to sqrt, which can't be avoided (and thankfully qSqrt is inline; 
the compiler *should* optimize it into the sqrt instruction rather than 
a function call, at least on x86 and other platforms that have such a 
thing).

I'm not *certain* that div,mul,mul,mul is cheaper than div,div,div so 
you might want to check into that (if it is, it's also possible the 
compiler would make that optimization, so you could probably omit it). 
You *might* even consider storing nz and nzs so that you only need to 
recalculate those if the Z scaling factor is changed, but that's 
approaching overkill.

Lastly, I hope this goes without saying, but...

Bad:
----
for (i = 0; i < width; ++i) {
   for (j = 0; j < height; ++j) {
     // conditionals to do stuff if this is an edge
     // ...
   }
}

Better:
-------
if (width == 1 && height == 1)
   return;
width_minus_1 = width-1;
height_minus_1 = height-1;
// calculate corner (0,0)
// calculate corner (width_minus_1,0)
// calculate corner (0,height_minus_1)
// calculate corner (width_minus_1,height_minus_1)
for (j = 1; j < height_minus_1; ++i) {
   // calculate edge (0, j)
   // calculate edge (width_minus_1, j)
}
for (i = 1; i < width_minus_1; ++i) {
   // calculate edge (i, 0)
   // calculate edge (i, height_minus_1)
   for (j = 1; j < height_minus_1; ++j) {
     // calculate interior (i,j)
   }
}

-- 
Matthew
Please do not quote my e-mail address unobfuscated in message bodies.
-- 
ENOWIT: .sig file temporarily unavailable