Games should emphasize some specific (but fairly broad) Science, Technology, Engineering and Math skills. Submissions are due January 5th.

]]>The basic idea is to come up with a rational function (ratio of two polynomials) as an approximation. There’s a long history of this in and out of graphics, but it usually has some inherent problems. Consider “Fast Phong Shading“, published by Bishop and Weimer in SIGGRAPH 1986. This method attempts to avoid the vector renormalization inherent in Phong normal interpolation by using a quadratic Taylor series. Unfortunately, Taylor expansion is inherently centered around a point, in this case the center of the triangle. The approximation will have some error by the time you get to the edge of the triangle, and two triangles sharing that edge won’t necessarily have the same amount of error in their normalization. For big triangles, this can result in a visible shading discontinuity along trangle edges. Not good.

Schlick’s idea is to express what’s important about any function as *kernel conditions*, then apply those as constraints. For his Fresnel approximation

These constraints are that *F* should be 1 when *dot*(*N*,*V*)=0, *F*_{0} when *dot*(*N*,*V*)=1, and the first several derivatives of *F* should also be 0 when *dot*(*N*,*V*)=1. In the paper, he also has similar approximations for the geometric attenuation and distribution terms of a Cook-Torrance shading model, but the method is a good one to know in general for reducing shader computation:

- Look at the function and pick the kernel conditions: value or derivatives at some critical points, desired integral over the whole domain, etc.
- Based on the way the function looks, choose a rational function with the right number of coefficients. This is still somewhat of a black art, since there will be many choices for numerator and denominator polynomial that have the same number of coefficients as you have kernel conditions. For example, for four conditions, you could choose a cubic, quadratic numerator/linear denominator, linear numerator/quadratic denominator, or 1/cubic denominator.
- Solve for each coefficient
- Evaluate the total error, decide if it is good enough. If not, try a different rational function, or add extra kernel conditions to fix the problem.

Multi-variate functions are OK, though will potentially introduce many additional coefficients. This kind of approximation is usually best applied near the visual output end of a shader. Applied to computation too early on, and the small errors may be magnified by the intervening shader code. None the less, it can be a great way to reduce a computationally expensive shader.

]]>Quaternions are a combination of a 3D vector and a scalar. Rotation by an angle `θ`

around a unit vector `v`

is represented by the quaternion

q = vec4(sin(θ/2)*v, cos(θ/2))

The quaternion, `q`

, will be unit length (the sum of the squares of all for components is one). You can easily look at a quaternion and tell the axis it rotates around by looking at the vector part, which won’t be unit length anymore since it is scaled by `sin(θ/2)`

, but still points in the right direction. You can tell the angle by either looking at the scalar part, or the length of the scaled vector part. The inverse rotation will have the axis pointing in the opposite direction, which you can think of as either a rotation around the opposite axis, or the result of flipping the sign of `θ`

.

One great thing about quaternions is how well they interpolate between two rotations. If you just linearly interpolate and re-normalize, you get a nice interpolation of rotation axes along a great circle, plus a smooth rotation of twists around those axes. For `t`

in [0,1],

normalize( q1*(1-t) + q2*t )

In comparison, directly interpolating between rotation matrices gives you weird squishy non-rotations in the middle, and interpolating Euler angles tends to take you on odd paths, especially near the singularities.

You can do even better with the spherical linear interpolation or *slerp*, as straight linear interpolation goes faster in the middle than at the ends. There’s nothing quaternion-specific about slerp. Given two N-dimensional unit-length vectors, it’ll interpolate between them along an N-dimensional sphere. slerp between unit-length vectors `q1`

and `q2`

is given by

φ=acos(dot(q1,q2)) slerp(q1,q2,t) = q1*sin(φ*(1-t))/sin(φ) + q2*sin(φ*t)/sin(t)

That looks quite a bit like the linear interpolation, but with different mixing factors for `q1`

and `q2`

. If the linear interpolate and renormalize isn’t good enough, but slerp’s trig functions are too slow, there are several approximations that land somewhere in between. My favorite is the one that does normal linear interpolation but replaces `t`

with a polynomial to adjust the interpolation speed.

There are three main things to know for using quaternions. First, to rotate a single point `p`

by quaternion `q`

:

pRot = p + 2*cross(q.xyz, q.w*p + cross(q.xyz, p))

The second big operation is converting a quaternion rotation into a 3×3 rotation matrix. That’s

vec3 Q = 2.*q.xyz; qMat = mat3( 1 - Q.y*q.y - Q.z*q.z, Q.x*q.y + Q.z*q.w, Q.x*q.z - Q.y*q.w, Q.x*q.y - Q.z*q.w, 1 - Q.x*q.x - Q.z*q.z, Q.y*q.z + Q.x*q.w, Q.x*q.z + Q.y*q.w, Q.y*q.z - Q.x*q.w, 1 - Q.x*q.x - Q.y*q.y);

The final big one is how to combine two quaternion rotations into a new one. That’s just

q = vec4(q1.w*q2.w - dot(q1.xyz,q2.xyz), q1.w*q2.xyz + q2.w*q2.zyx + cross(q1.xyz,q2.xyz));

So when do we use each? In raw operations, the direct quaternion rotation is 18 multiplies and 12 adds. GPU operations are at least somewhat dependent on the GPU, but assuming up to a four-element multiply-and-add or dot product is a single operation, it’s probably about 6 GPU instructions. That’s two for the inner cross product, one more to add in `q.w*p`

, two more for the outer cross product, and one more to multiply by 2 and add to `p`

.

The matrix construction, assuming the common multiplies are factored out, is 12 multiplies and 12 adds, while the matrix multiply to actually rotate with it would be 9 multiplies and 6 adds, for a total of 21 multiplies and 18 adds (clearly more expensive). In GPU terms, it’s about 7 GPU instructions to create the matrix and 3 to use it (still more expensive). On the other hand, to apply the same rotation to two points is 36 multiplies and 24 adds using the direct rotation, but only 30 multiples and 24 adds with the matrix multiply since you can use the same rotation matrix twice. So, to transform a single point, it’s best to use direct quaternion rotation, but for two or more (or even a point and normal), converting to a 3×3 matrix form is a big win.

Finally, combining two quaternions is 16 multiplies and 8 adds, or about 6 GPU operations. In comparison, combining two rotation matrices takes 27 multiplies and 18 adds or about 9 GPU instructions.

So… If you are doing lots of work with rotations, or need to do any interpolation between transformations at all, quaternions are the tool for you. If you are transforming more than one point by the resulting rotation, you’re better off converting the quaternion to a matrix to use it, but given that quaternions interpolate so much better and are so much cheaper to combine together, it’s still often worth it.

Perhaps in a later post I’ll go through the math behind quaternions, and how to get a quaternion given a rotation matrix, but for now, this is just enough quaternion to be dangerous.

]]>I’ve posted the entire project online at http://www.workly.com/starryeye/se10.zip

]]>Congratulations to all on a show well-done!

]]>The most important directions for Fresnel reflectance are the surface normal, *N*, the direction you see the surface from, *V*, the direction the reflected light is coming from *L* (all unit-length vectors). Since it’s dealing with reflected rays, *N* should be half way between *V* and *L*, so *N* = *normalize*(*V*+*L*) and *dot*(*N*,*V*) = *dot*(*N*,*L*). I’m ultimately going to be applying it to an environment map, so I’ll stick with the *dot*(*N*,*V*) version.

There are also constants that control the strength of the effect: *n*_{1} and *n*_{2}, the indices of refraction of each material, or sometimes rewritten in terms of *n*, the ratio of the two indices of refraction. Indices of refraction for common materials are pretty easy to find in a Physics text or online: vacuum is 1, air is pretty close to 1, water is about 1.33, glass is about 1.5.

Fresnel reflectance has different terms for incoming light polarized parallel to the surface than for light that’s not parallel to the surface. I’ll add another direction *T*, for the refracted light, since it makes the equations easier, though you can always rewrite the refracted direction in terms of the reflected one.

The polarization dependence is handy if you’re using a polarizing filter to enhance or diminish the reflections in a photograph, but most graphics assumes unpolarized light, which is an equal mix of both terms. Cook and Torrance came up with the combined form that was used in graphics for many years:

If you don’t have the index of refraction, it’s easier to measure the reflectance at normal incidence (looking head-on where it is smallest). From Cook and Torrance’s paper, that’s

But… almost everyone these days uses Schlick’s approximation for Fresnel. There’s actually lots of good stuff on approximating functions in Schlick’s paper, but only the Fresnel approximation seems to have really stuck:

For a little more intuitive control, you can write this in terms of *F*_{0} at normal incidence and *F*_{90} at the edge of the object:

Image d above is what it looks like for a high dynamic range environment map. I’ve also included a regular Blinn-Phong layer with a light source positioned at the brightest point in the environment texture. F is just the blend factor between the two.

It’s important to use a high-dynamic range texture for this, because ordinary 8-bit textures can’t distinguish between “the sky is bright”, maxed out at 255 and “the sun is 10,000 times brighter”, also maxed out at 255. Multiply by 0.04, and you get about 10 in both cases. But image was exposed so the sky really is 255, the sun should be around 25,500,000. When multiplied by 0.04, that’s still 10,200,000 (or really bright). If we don’t keep the full dynamic range of the environment map, we get the somewhat disappointing result in image c

Oh, and images a and b are what you get for a couple of choices of constant blend fraction. The glancing reflections around the edges of the model really add an extra dimension of shininess.

]]>Often you want a stream of random numbers in a shader. On the CPU, you usually have one stream of random numbers. Each time you ask, you get a new number. You might need to ask for new numbers a bunch for different things, but it doesn’t really matter if the requests for different purposes are all mixed up. On the GPU, you often want a bunch of independent streams corresponding to objects, characters, or grid cells in space. You want any GPU thread that asks for the random numbers associated with a particular stream to always get the same answers as a different thread asking for numbers from the same stream.

Several recent papers have used some kind of cryptographics hash for this. Cryptographic hashes are designed for things like signing messages, so the same input should always give the same result. That output should be pretty random (or someone might be able to crack the hash and sign fake messages, add viruses, or other nefarious things). We don’t care about the cryptographic security, but the other features are great for generating streams of random numbers: you put in the stream ID and a sequence number, and you get a random number. Since it’s a hash, every time you ask for the 5th number in the 1200th stream you get exactly the same answer.

The first graphics paper I know of to use this idea in graphics was my “Modified Noise for Evaluation on Graphics Hardware” from Graphics Hardware 2005. I used a modification of the Blum-Blum-Shub algorithm that pretty much destroyed all of its randomness, but made OK looking noise. In many ways, I was inspired by SGI’s lavarand, which encrypted an image of a bunch of lava lights to create random numbers (cool in its own right, though doesn’t really have the parallelization or repeatability). Tzeng and Wei (in “Parallel White Noise Generation on a GPU via Cryptographic Hash”, I3D 2008) used MD5, which was way more random than my stuff, but kind of slow. Our new paper uses the Tiny Encryption Algorithm (or TEA). TEA is actually a cipher rather than a hash, but for small input, it’s all the same. It repeats the same core mixing function for some number of rounds (64, when you’re using it for encryption). We show that lots of graphics tasks work well with just two rounds, but the more rounds you do the more random the results. After 8 rounds, it is random enough to pass both the DIEHARD and NIST randomness test suites.

For N rounds, it looks something like this:

uvec2 v = uvec2(stream, sequence); uint s=0x9E3779B9u; for(int i=0; i<N; ++i) { v.x += ((v.y<<4u)+0xA341316Cu)^(v.y+s)^((v.y>>5u)+0xC8013EA4u); v.y += ((v.x<<4u)+0xAD90777Du)^(v.x+s)^((v.x>>5u)+0x7E95761Eu); s += 0x9E3779B9u; } return v;

The `s`

value is specified in the TEA algorithm, but the other hex constants are an encryption key, so could conceivably be changed to provide even more streams of random numbers.

For a streamlined two round version, I’d use something like this:

v.x += ((v.y<<4u)+0xA341316Cu)^(v.y+0x9E3779B9u)^((v.y>>5u)+0xC8013EA4u); v.y += ((v.x<<4u)+0xAD90777Du)^(v.x+0x9E3779B9u)^((v.x>>5u)+0x7E95761Eu); v.x += ((v.y<<4u)+0xA341316Cu)^(v.y+0x3C6EF372u)^((v.y>>5u)+0xC8013EA4u); v.y += ((v.x<<4u)+0xAD90777Du)^(v.x+0x3C6EF372u)^((v.x>>5u)+0x7E95761Eu);

In fact, the pictures in my “distributing stuff” post were a bunch of points placed in a vertex shader using exactly that code. I happened to use OpenGL’s GLSL for that, but the Direct3D HLSL version looks almost identical, with `uint2`

replacing the `uvec2`

.