- Stenciling is fast. Unlike hardware, there is not really a pipeline with a fixed fillrate. With MMX, I can fill the stencil buffer at gigahertz speeds, so to say. The newest hardware probably still beats it, but either way, it's damn fast for software.
Well, NVIDIA has the solution in their GPU for that aswell
Their pipelines can do twice the work if there are no colour operations, but only z/stencil.
Also, in practical situations, you generally only stencil based on certain conditions, eg z or alphatest. And that already slows a CPU down a lot.
So the advantage is purely theoretical, I'd say.
- Sampling multiple textures is fast. I once experimented with a shader that sampled several -hundred- textures. Performance only dropped a relatively small factor. This shows that modern CPUs are used more efficiently if you give them more work. More independent work and less jumps. There's also a lot of setup work that doesn't have to be redone for new sampling.
Again, this is probably purely a theoretical advantage. You rarely need more than 2 or 3 textures, especially if you are doing more advanced shading (more arithmetic, less textures). So perhaps if you throw enough textures at it, software rendering will actually be faster than hardware, but there probably aren't any practical cases for using that many textures.
- Shading is fast. That's right. SSE can compute most arithmetic shading operations faster than a hardware shader (one pipeline). And all that in full 32-bit precision. Of course it's no match once you compare it to hardware with four or more pipelines. But either way, operations between colors and other vectors is very fast compared to texture operations and the extra work it requires. If only CPUs had instructions to accelerate sampling operations...
This is probably again a theoretical advantage only. Most hardware can do reasonably complex per-pixel shading in realtime today (at least GF3 and up). Perhaps if you make it more complex, software will be faster, but will it look any better, and is it still realtime?
To conclude: if the question is "what can you do with a software renderer for games?", these aren't the answers, since these advantages cannot be exploited in a way that gives you a renderer that is fast enough for realtime use. The only thing you are basically saying is "Hardware cannot do these things in realtime either". Which is not really interesting
But if you are talking about offline rendering, then yes, software has definite advantages over hardware at this time. You can do raytracing in a more efficient way than on current hardware, you can implement a REYES renderer for high quality antialiased rendering, you can use complex 2d, 3d, 4d, ... procedural textures, etc.
Anyway, I have to disagree with Scali once again (nothing personal): shadows, reflections and per-pixel lighting are - though with considerable effort - possible with acceptable performance. Granted, hardware is way better at all of this.
As soon as you get my program running on your renderer, we can see how acceptable that performance will be (how far along are you? it is an old DX8 thing without shaders, so it should almost run on your existing DLL already).
And note that this is just one optimized demo, it doesn't make a game yet.
So even if you would get 15 fps here, that doesn't mean you can actually make an entire game like this.