AnarchX said:
All this means is that it will be DX10.X where X is currently unknown or unannounced, and not DX11.
AnarchX said:
My sources and other rumors in the net indicate that both chips will be aimed on midrange and dual-gpu-boards as interim solutions in high-end, that both companies can more concentrate on the DX10.1 solutions in H1 2008.
I don't disagree with most of what you said there, with one (big) exception: What makes you think those are interim solutions? Because I'm pretty sure they are not.
psurge said:
I dunno, we've been hearing forever about how shaders that can do things like read the current framebuffer value and then write back are hard to implement with good performance... I guess I'd like to understand a bit more in detail why this is suddenly a good idea/workeable from a HW perspective.
Well, I'm not completely sure myself!
NVIDIA had a patent on this for a long time, it was filed around 2003, fwiw... The basic idea is to have a tiling architecture with a fixed number of tiles being worked on at the same time, and you can reserve/unreserve tiles with coverage masks etc. - it's not perfect, but I'm not sure you can do much better than that.
It does get more and more expensive the longer the M in RMW takes, though, since that's more simultaneous tiles and more potential data being blocked by tiles having already been reserved, though. But I doubt this is a massive problem in practice.
3dcgi said:
Why do you assume moving triangle setup to the shader will result in a performance improvement? It's likely that there are other data paths outside of the setup engine that limit performance to its current rate.
Indeed, but I think those parts are potentially much less expensive to design for higher throughput than triangle setup - so there would still be hard limits, but they should be significantly higher.
AnarchX said:
But I do not expect more than a checkliste feature.
It will not be a 'checklist' feature, as it won't be on GeForce checklists and Tesla users couldn't care less about checklist features.
Throughput is obviously going to be a few to several times lower than FP32, but this is to be expected. Consider the DP CELL: It achieves 'only' 100GFlops in DP mode. If FP64 on G92 is 3-5x slower, it can still easily beat that.