trinibwoy said:Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
DemoCoder said:How about a test that requires FP16 blending, but uses ping-pong pixel shader workarounds for other cards.
I think there's going to be a lot of dynamic branching and vertex shading(the 2 big advantages of R520).trinibwoy said:Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.trinibwoy said:Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
Nick said:Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.
Nick said:Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.
Nah. Each quad pipe can be scheduled independently, and can hold multiple triangles simultaniously, without being constrained by screen-space tiling.Aren't all the quads on G70 working on the same command?
One of NVidia's primary changes in the G70 architecture is to make each of the quads run independently of the others.Nick said:Anyway, unless there will be a 3DMark2006, NVIDIA has plenty of time to improve branching performance, if it's really worth it. G70 already performs way better than NV40 and it seems only an incremental change to (further?) decrease batch sizes. Or am I wrong?
And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)Bob said:G70 has no 2D footprint for batch sizes, beyond the quad granularity. So indeed it is possible to construct a contrived case where G70 is ~2x faster at branching than R520.
Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)
Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.Bob said:Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.
nAo said:Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.
It would be nice to have some additional detail..