Dynamic branching in 3dmarkxx ?

trinibwoy

Meh
Legend
Supporter
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
 
trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?

Great idea. Let's put vertex texture fetch in too. :LOL:

Edit: Actually, that might not be a bad idea. If I understand the conversation between Wavey and Demirug correctly (probablilty: ~10%), then ATI could fix the issue in their driver to make it transparent to devs --they just need to be "inspired". That would inspire them. :p
 
Last edited by a moderator:
DemoCoder said:
How about a test that requires FP16 blending, but uses ping-pong pixel shader workarounds for other cards. :)

Depending on how it's used it probably wouldn't be quite as spectacuarly bad as you might imagine. I recently added ping-ponging to the demo I'm building, to do HDR on PS-2.0, and on an x850 it took the frame rate down from about 50fps to about 35 (at 640x480 :devilish: :oops: ) ie going from 8bit with blend to 16bit with ping-pong - however on geforce 6, using ping-pong over FP blend absolutly slaughtered performance, whereas FP over 8bit was about the same ratio as the radeon. This with around 80 passes however, which is getting to the higher end of what I'm meddling with.

I'll have a demo of it eventually :p - and yes, it will have a benchmark option.
 
trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
I think there's going to be a lot of dynamic branching and vertex shading(the 2 big advantages of R520).
 
trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?
Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice. ;)
 
Nick said:
Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice. ;)

The main reasons is the way smaller batch size (compared to the G70) and a has a specific Branch Execution Unit.
 
Thanks!

I see G70 wins in most other shader tests though. And looking at the release dates of NV40, G70 and R520 I don't find it too embarassing. I almost expected 2 or 3 times slower. With a good overclock you can nearly close the gap. Don't know the overclocking potential of R520 though...

Anyway, unless there will be a 3DMark2006, NVIDIA has plenty of time to improve branching performance, if it's really worth it. G70 already performs way better than NV40 and it seems only an incremental change to (further?) decrease batch sizes. Or am I wrong?

Either way, good to see ATI take the lead in several shader tests again!
 
ATi's batch is at max 64 pixels based on earlier discussions. Nvidia reduced their batch size on the G70 but it's still in the high hundreds right? Don't know how much the G7x architecture would need to be changed to support ATi's level of dynamic branching performance.
 
The batch sizes for R5xx are 4x4 (16) pixels, the batch sizes for G70 are 64x16, according to testing.
 
G70 has no 2D footprint for batch sizes, beyond the quad granularity. So indeed it is possible to construct a contrived case where G70 is ~2x faster at branching than R520.
 
Well, the 2D footprint for R5xx is 2x2 - the batches are actually, 4x(2x2). Aren't all the quads on G70 working on the same command?
 
Aren't all the quads on G70 working on the same command?
Nah. Each quad pipe can be scheduled independently, and can hold multiple triangles simultaniously, without being constrained by screen-space tiling.
 
Nick said:
Anyway, unless there will be a 3DMark2006, NVIDIA has plenty of time to improve branching performance, if it's really worth it. G70 already performs way better than NV40 and it seems only an incremental change to (further?) decrease batch sizes. Or am I wrong?
One of NVidia's primary changes in the G70 architecture is to make each of the quads run independently of the others.

NV40 shares a batch across all quads.

On that basis NVidia has nowhere to go in terms of dynamic branching performance, other than adding quads - and I can't see that working at all well.

But I've seen an NVidia patent for fully decoupled texturing and another for advanced scheduling - so between the two I expect G80 to tackle dynamic branching head-on.

Jawed
 
Bob said:
G70 has no 2D footprint for batch sizes, beyond the quad granularity. So indeed it is possible to construct a contrived case where G70 is ~2x faster at branching than R520.
And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)
It would be interesting to know if G70 can be tweaked to improved dynamic branching performance using smaller batches (obviously I believe this would not be free from a latency hiding standpoint..)

ciao,
Marco
 
Last edited:
And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)
Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.
 
Bob said:
Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.
Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.
It would be nice to have some additional detail..;)
 
IMO any new 3DMark should include 4x MSAA standard. Ditto AF.

Games look significantly better with both, and for high end gamers playing cutting edge games on bleeding edge hardware they will want AA and AF.

3DMark, from what I gather, is constructed to give a generic snapshot of future game tech and how it will work on GPUs. That being the case running with AA/AF would be a good barometer of how people interested in GPU performance would run games if they could.

Not making AA standard is kind of like turning down texture quality or lighting quality.

So I say allow 3DMark to run without AA, but make it the standard default configuration.
 
nAo said:
Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.
It would be nice to have some additional detail..;)

For a "minor" refresh" could they just drop the batch-size to 16*32 or 16*16 if this is indeed where the bottleneck is with the current architecture?

And the NV40´s had a batch-size of 4096 and G70 1024, is this correct understood?
 
Back
Top