Mintmaster
Veteran
Since when does saying you don't understand equivalent to saying you can't understand or won't understand even if I explain it? And why would I bother with explaining it if I felt the latter?It's kind of arrogant to think me and a lot of people don't understand something. I'ts quite possible we're ignorant about something, but that doesn't mean we wouldn't understand it if properly explained.
Of course you are going to think that way if you ignore all the data flow that distributes the information required for rasterization into the shader units.With software rasterization you can keep data movement to a minimum because it can be done more locally.
Nothing about rasterization can't be kept local. First of all, going from vertices to quads necessitates a complete redistribution of data by its very nature, so you can't eliminate that data flow. Then you have early Z/stencil testing to cull quads before they even enter the shader pipeline. That involves specialized functionality which would have to be repeated in each shader unit in order for the rasterizer to be eliminated along with duplicated data flow due to the culling not being centralized. Load balancing is made far more complicated since the pixel generation isn't centralized.
I think the best way to see why centralized, FF rasterization is here to stay is to just think of the rasterizer as part of the dispatcher, which is needed to oversee all the computation units. Rasterization is not a workload that needs to be distributed, but rather a way of distributing the real workload.
Never say never. Fixed-function alpha testing and fog also did't need any flexibility in moving data around. Yet they are gone now.
Alpha testing is not gone. It's simply been exposed as a shader function now (texkill/clip) instead of a state change in the API. Alpha test has always been synonymous with per-pixel culling, and GPUs will always have specialized hardware to handle that differently from Z-culling. There's nothing more generic about it now either aside from aesthetics. If Microsoft took away clip() but gave the alpha test state change back, all you'd have to do is add a boolean to your program and modify the alpha value at the end to match your test.It's generic because the test has become arbitrary. The actual killing of the 'pixel' is now also used for other functionality.
Fog is a pretty meaningless example, because nobody ever said that fog will never be generalized.
For texturing, remember that I said filtering, not address calculations. Simply getting 4-8 samples per clock into each shader unit instead of one filtered sample (which is required to get close to FF performance) will cost more die space than the FF filtering unit itself.
Unless you think devs will one day prefer 1/4 performance for the majority of texels in order to have filtering flexibility, or forgo filtering for the vast majority of memory reads (i.e. >98%) by the shader, there won't be an impetus to remove FF filtering from a GPU. I emphasize the G, because I'm talking about a processor used primarily for graphics, and filtering hasn't really changed for many decades. If HPC miraculously finds an application that allows them to displace GPUs, that's a different story.No matter how inefficient something may seem to hardware designers, there's no other choice but to follow the demands of the software developers. If they want more generic memory accesses, texture samplers will eventually disappear. We can disagree on whether this is what they want, but I don't think fixed-function hardware can turn it around (in the long run).
Now, if we move to path tracing and it needs 50x the memory accesses that final pixel texturing does, I guess its possible that devs will do this, but I don't think that's reasonable.
No, it didn't disappear. It was expanded into the programmable unit. What you see with the API is different from what hardware developers implement. ATI never really had fixed function TnL. R100's vertex unit was just short of a DX8 vertex shader. I wouldn't be surprised if GF1/2 were the same. GF3 didn't remove DOT3 from GF1/2 and replace it with a pixel shader. Both had the same register combiner architecture of which DOT3 was a small part. GF3 just added dependent texturing (finally catching up to Matrox G400 pixel ability) and DX8 exposed it all differently.That was the whole point! It's a perfect example. Fixed-function hardware disappears because the usage evolves in favor of something more programmable.
Fantastic argument. "Nobody is implementing bubble sort hardware in a CPU. Therefore all fixed function hardware is on the way out..."Now ask yourself why nobody is planning to implement those features in hardware...
I wasn't talking about the software challenge. I was talking about the additional hardware needed to make software rasterization possible. You unfortunately glossed over the paragraph I wrote about the paper on software rasterization.There's definitely a huge software challenge ahead of us. But that doesn't make it a bad idea.