Thanks, DemoCoder, that was a very informative pdf. I'm definately looking forward to VS 3.0, so I hope ATI didn't skip it.
I'm not sure why they talked so much about that gaussian approximation of normals. Approximating a sum of gaussians as another gaussian won't do that much for quality. A 2D lookup in a 128x128 texture will certainly cause plenty of thrashing as well, gobbling up cache space, as N dot H changes with high frequency wrt to space. I guess 128x128 is probably a bit extreme in one axis at least, as N dot N shouldn't vary a whole lot.
I think PTM's (polynomial texture maps) solve this problem in a much better way. They should be faster, can approximate self-shadowing and interreflections, and have no mipmap problems. I guess this discussion should go in the coding forum.
I'm rather skeptical about NV40 having so many (16) full pixel pipelines. I mean they have fp texture filtering (with AF), fp blending, supposedly far better per-clock pixel shading, VS 3.0, PS 3.0, and NV35 took 130 million transistors for a 4x2 config. They're supposed to get all that with only 200 million transistors? I'd be mighty impressed. Seems like they'd have to start from scratch (which is possible if a parallel team has been working on this since GF3 or GF4), and get amazing design efficiency.