darkblu said:
back to topic after a seaside weekend.
I hope it was relaxing! That remembers me it's been four year or so that I've seen the sea, even if it's only 50 km far... Bah, got to study for re-examinations.
the idea behind THURP is to have a classic, multi-pass-based rasterizer which (a) generally targets max precision, yet (b) has an execution flow which does not prevent the lib from getting subsequently re-targetted to real-time performance w/o the need for radical changes. otherwise many a calculation in THURP is performance-suboptimal as of pesent; the mere fact that all calculations are carried out in floats regardless of targetted dynamic range is indicative, i'd presume.
But for maximum precision and consistency we already have the reference rasterizer, not? And there are other libraries for the really 'scientific' rendering. So, although software rendering is always a cool project, I don't fully understand the purpose of THURP yet. I chose to just leave the highest quality rendering to specialized software and hardware, and focus on supporting as many features as possible at acceptable precision and performance. Image quality is of importance to me, but if I have an approximation which doesn't cause artifacts then I'm happy with it. What are your future plans with THURP?
although you're right in saying that getting the derivatives analytically is faster compared to what THURP presently does, in the general case getting them as difference from the neighbours is a) more robust (can be applied for any sort of quantity, including dependent texutring), b) can be real fast if the y-direction derivatives are implemented by the scanline scheme i mentioned in the prev post. after all, you can hardly beat a single subtraction per derivative, can you? now, the fact that presently THURP's mipmapping code does not implemet such a scheme does not imply such is not intended for later on.
What worried me most performance-wise was not the subtractions, but the perspective divides. This is one of the slowest things when texturing, even with SSE optimizations. But with your method you do it three times per pixel. Of course the method used by hardware with 2x2 blocks is not necessarily slower and certainly more robust than the analytical method.
So, I'm really very interested in the dsx/dsy method, but it's impractical with my current idea for the ps 3.0 design. The dynamic flow control makes it nearly impossible and I have no idea how to deal with it. Does anyone know exactly how the hardware gets the mipmap LOD for conditionally executed texld instructions? Anything the hardware can, I can do too even if I have to take a different approach...
I have one new idea. I can render 2x2 blocks, but every pixel sequentially. For every dsx/dsy instruction (or tex instruction for that matter), I can split the shader. Temporary registers get stored in memory, together with the texture coordinates. This is easy with my build in automatic register allocator. When all four pixels have finished the first part of the shader, and the texture coordinates are known, I can continue with the second part, etc. The only possible performance losses are the extra pixels being calculated at the edges, and the saving/restoring of the registers. But if I got to believe Dio then the latter won't make a difference
interpolating rho is not an option for the mainstream version of THURP. might be an option for a more performance-targetted branch, though. anyway, by-interpolations optimizations are not on the present tasklist (covering functionality to perform per-pixel phong is)
That brings me to another deficiency of THURP. Your rendSpan.cpp file is huge, and nearly unmanagable. Every time you add a new feature you'll have to rename your functions and the file will double in size. The latter porblem can be solved elegantly by using templates:
Code:
template<bool phongEnabled>
void renderSpan()
{
// general setup
if(phongEnabled)
{
// phong implementation
}
else
{
// gouraud
}
// generic code
}
You get the idea... But it doesn't solve the problem of typing all the function names, and you'll need a gigantic switch statement (even bigger than what you're already using). And even if that was managable by using macros and such, adding even more rendering options will exponentially grow your executable, not to mention the compilation time.
Run-time compilation looks quite similar to the templated code, but is fully flexible. In my case I use my own run-time assembler
SoftWire, but you can just as well let the C++ compiler do the job. In that case it's also referred to as 'code stitching'. This is rather easy to implement by using 'naked' functions (
__declspec(naked) in Visual C++). Just remember only to use static data, so you can relocate code without much trouble. SoftWire is more efficient because it can do automatic register allocations and peephole optimizations (and I'm planning a scheduler), but for THURP a stitcher would be perfect...