Chalnoth said:That's what ATI wants you to think, for sure. We'll have to wait for the NV40 to see how well nVidia has handled it, though.
991060 said:In a recent leaked pdf, ATi suggested not using dynamic branching before R5XX, does that mean all of the problems you mentioned will be resolved in R5XX?
sireric said:I'm not going to comment on anything but this thread.
What I stated above is a generalized description of shader branching issues, which is applicable to most/all VPUs/GPUs that implement DX9 level of gfx processing.
DaveBaumann said:I don't really think you can sum the developments up like that Joe, otherwise ATI would have just made a very fast DX8 processor now.
IMO, ATI are very driven by what they consider to be infelction points at the moment, and the other factor is that their development cycle just isn't in the same step as NVIDIA's at the moment.
Interpolation is fast, probably a handful of cycles. But the memory reads are that costly. No amount of cache can cover the fact that you need to read in texture data from video memory frequently. And instead of doing speculative reads to have the data ready beforehands, the GPU designers prefer to have a long quad FIFO where quads are parked until the texture data requested is available (almost) for certain, whether it's in the cache or not ("design for cache miss"). The texture cache is there to save bandwidth, not latency.krychek said:Great thread! Now someone please explain why a texture fetch has such a high latency. Is it because the texture cache reads are costly or coz the interpolation and other calculations are costly or both?
Joe DeFuria said:DaveBaumann said:I don't really think you can sum the developments up like that Joe, otherwise ATI would have just made a very fast DX8 processor now.
Why? Are PS 2.0 shaders non-practical, performance wise, in the R300?
991060 said:In a recent leaked pdf, ATi suggested not using dynamic branching before R5XX, does that mean all of the problems you mentioned will be resolved in R5XX? Since I don't see there's any fundamental solution for the fighting between long shaders and limited resources, if possible, I'd like to hear how you guys at ATi find a way around.
DemoCoder said:If anything, unified shaders and more complex resource management could amplify problems, not solve them.
DaveBaumann said:DemoCoder said:If anything, unified shaders and more complex resource management could amplify problems, not solve them.
Could. Unless you effectively analyse the issues and build accordingly.
DemoCoder said:How is that different than any other issue? I could say "PS3.0 branching could amplify performance issues, unless you effectively analyse the issues and build accordingly. So why am I to believe that the R500 will be do correctly, but NV40 won't?
The primary difference here, of course, being that pixel shaders are optimized for texture accesses, whereas vertex shaders are not. In a unified shader, you will still have this problem, as texture addressing will always be hard on branching, and the deeper pipelining that texture addressing requires will hurt vertex shader branching. Of course, I'm sure that by that time, architectures will have more rigorous optimizations from branching, but I don't think unified pipes will help pixel shader branching at all.DaveBaumann said:However, I'm fairly sure that NV40's VS branching will be better than the PS branching - at least a unified shader allows branch (prediction) logic to be equally as good (or bad) for all operations and potentially enables you dedicate more die to a single unit as opposed to separate logic for both pixel and vertex shaders.