Re: Instructions
Xmas said:
2 KiB is really a bit small for a chip that supports 6 textures per pass.
btw, GFFX stores PS code in video memory AFAIK. There are no jumps, so access is predictable.
Hmm, well yeah, with 6 textures and 32-bit textures, it is too small.
2048/6 = 341
But then, there are 4 pipes...
That's 85 bytes... Assuming 32-bit textures , that's 21 pixels.
Now, that seems too little.
Triangles are NOT processed on a per-line basis, there's multiple pixels being processed on a line, then you move to the next line. Then you move to the right. Then you again treat a part of the two lines.
That's because, otherwise, you'd need texture cache being able to fit two full lines. In such a system, you save a lot of transistors and barely lose any memory bandwidth.
But still, that would mean 10 pixels on a line being processed at the same time. That seems insufficent.
But then again, in most cases, you won't use 6 textures.
So, well, 30 pixels on a line when using 2 textures seems sufficent. It should give sufficent efficiency.
And in the case of not using 32-bit textures but something like low quality DXTC using 8 bit, it would be 120 pixels. That's nearly too much!
Many games use 3 or 4 textures, so using DXTC and that, it should be "okay".
Something I wonder, too, is if the hardware can automatically determine how much pixels on a line are processed to get maximum efficiency based on texture cache size. That would be a lot more important. And in the case it can't, which is actually quite likely, it might real bad ( near zero ) efficiency when using 6 textures...
But even forgetting that problem, it would be "okay" - not much better.
If I didn't do any of my calculations wrong, 4KB for the four pipes might very well be fine.
But then again, more could always give a slight boost to performance. The real question is wether that boost is sufficent to justify the transistor count increase
Another interesting factor is the decreasing size of triangles. I don't think texture cache efficiency is good ( if not automatically nil ) when keeping texture info from another triangle. So, with the decreasing size of polygons, could something like 20 pixels/line be sufficent in most situations?
Uttar
EDIT: Sounds like you are right: the GFFX *does* store all of its instructions in Video Memory. Sounds like that's a good reason for NV31 & NV34 to support 1024 instructions too.
This would indeed unable Dynamic Branching to work effictively in the PS, I guess. But could Static Branching still work well in the PS using that? I'd guess it could, but I might be wrong.
But GFFX temp registers are still stored in cache. As are several others things used in shaders. And those things are more expensive than on the R300, because they're FP32 ( yes, although FP32 performance is bad and nVidia is trying to make DX9 drivers use FP16 everywhere, it sounds like they made everything with FP32 in mind - performance probably isn't on par with their expectations... )
EDIT 2: After rethinking about it, I just don't understand how putting all of that in video memory makes sense...
Let's imagine each instruction is 45 bits, just like in the case of the VS according to the B3D article. Or rather, let's imagine it is 40 bits, just to be conservative.
Imagine an average of 20 instructions/pixel, and 1600x1200. All that at 60FPS.
That's 12GB/s...
Now, I just don't quite understand how that makes sense. There gotta be a misunderstanding somewhere. Unless nVidia found a way to defy mathematics, too!
Woah, that's gotta need serious driver tuning.