Dynamic branching in 3dmarkxx ?

trinibwoy · Oct 8, 2005

Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?

Geo · Oct 8, 2005

trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?

Great idea. Let's put vertex texture fetch in too.

Edit: Actually, that might not be a bad idea. If I understand the conversation between Wavey and Demirug correctly (probablilty: ~10%), then ATI could fix the issue in their driver to make it transparent to devs --they just need to be "inspired". That would inspire them.

DemoCoder · Oct 8, 2005

How about a test that requires FP16 blending, but uses ping-pong pixel shader workarounds for other cards.

Graham · Oct 8, 2005

DemoCoder said:
How about a test that requires FP16 blending, but uses ping-pong pixel shader workarounds for other cards.

Depending on how it's used it probably wouldn't be quite as spectacuarly bad as you might imagine. I recently added ping-ponging to the demo I'm building, to do HDR on PS-2.0, and on an x850 it took the frame rate down from about 50fps to about 35 (at 640x480

) ie going from 8bit with blend to 16bit with ping-pong - however on geforce 6, using ping-pong over FP blend absolutly slaughtered performance, whereas FP over 8bit was about the same ratio as the radeon. This with around 80 passes however, which is getting to the higher end of what I'm meddling with.

I'll have a demo of it eventually

- and yes, it will have a benchmark option.

pat777 · Oct 8, 2005

trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?

I think there's going to be a lot of dynamic branching and vertex shading(the 2 big advantages of R520).

Nick · Oct 8, 2005

trinibwoy said:
Given the embarrasing performance advantage the R520 has over the G70, should we expect dynamic branching to be a prominent feature of the next 3dmark?

Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.

Tim · Oct 8, 2005

Nick said:
Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.

The main reasons is the way smaller batch size (compared to the G70) and a has a specific Branch Execution Unit.

Geo · Oct 8, 2005

Nick said:
Could someone point me to the technical explanation and benchmarks of this R520 branching advantage over G70? I'm too lazy to go look for it myself, but I'm interested in how this is possible and how big the difference is in practice.

http://www.xbitlabs.com/images/video/radeon-x1000/x1800/Xbitmark_x18.gif

Bottom three.

Now, remember, you owe me one "I'm too lazy. . .".

Nick · Oct 8, 2005

Thanks!

I see G70 wins in most other shader tests though. And looking at the release dates of NV40, G70 and R520 I don't find it too embarassing. I almost expected 2 or 3 times slower. With a good overclock you can nearly close the gap. Don't know the overclocking potential of R520 though...

Anyway, unless there will be a 3DMark2006, NVIDIA has plenty of time to improve branching performance, if it's really worth it. G70 already performs way better than NV40 and it seems only an incremental change to (further?) decrease batch sizes. Or am I wrong?

Either way, good to see ATI take the lead in several shader tests again!

trinibwoy · Oct 8, 2005

ATi's batch is at max 64 pixels based on earlier discussions. Nvidia reduced their batch size on the G70 but it's still in the high hundreds right? Don't know how much the G7x architecture would need to be changed to support ATi's level of dynamic branching performance.

Dave Baumann · Oct 8, 2005

The batch sizes for R5xx are 4x4 (16) pixels, the batch sizes for G70 are 64x16, according to testing.

Bob · Oct 8, 2005

G70 has no 2D footprint for batch sizes, beyond the quad granularity. So indeed it is possible to construct a contrived case where G70 is ~2x faster at branching than R520.

Dave Baumann · Oct 8, 2005

Well, the 2D footprint for R5xx is 2x2 - the batches are actually, 4x(2x2). Aren't all the quads on G70 working on the same command?

Bob · Oct 8, 2005

Aren't all the quads on G70 working on the same command?

Nah. Each quad pipe can be scheduled independently, and can hold multiple triangles simultaniously, without being constrained by screen-space tiling.

Jawed · Oct 8, 2005

Nick said:
Anyway, unless there will be a 3DMark2006, NVIDIA has plenty of time to improve branching performance, if it's really worth it. G70 already performs way better than NV40 and it seems only an incremental change to (further?) decrease batch sizes. Or am I wrong?

One of NVidia's primary changes in the G70 architecture is to make each of the quads run independently of the others.

NV40 shares a batch across all quads.

On that basis NVidia has nowhere to go in terms of dynamic branching performance, other than adding quads - and I can't see that working at all well.

But I've seen an NVidia patent for fully decoupled texturing and another for advanced scheduling - so between the two I expect G80 to tackle dynamic branching head-on.

Jawed

nAo · Oct 8, 2005

Bob said:
G70 has no 2D footprint for batch sizes, beyond the quad granularity. So indeed it is possible to construct a contrived case where G70 is ~2x faster at branching than R520.

And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)
It would be interesting to know if G70 can be tweaked to improved dynamic branching performance using smaller batches (obviously I believe this would not be free from a latency hiding standpoint..)

ciao,
Marco

Bob · Oct 8, 2005

And how would you construct this case? (GPU tweaking at driver level or just submitting some 'special' geometry batch?)

Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.

nAo · Oct 8, 2005

Bob said:
Like I said, contrived and unrealistic. You'd need to build up a specially-made app to hit this case.

Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.
It would be nice to have some additional detail..

Acert93 · Oct 9, 2005

IMO any new 3DMark should include 4x MSAA standard. Ditto AF.

Games look significantly better with both, and for high end gamers playing cutting edge games on bleeding edge hardware they will want AA and AF.

3DMark, from what I gather, is constructed to give a generic snapshot of future game tech and how it will work on GPUs. That being the case running with AA/AF would be a good barometer of how people interested in GPU performance would run games if they could.

Not making AA standard is kind of like turning down texture quality or lighting quality.

So I say allow 3DMark to run without AA, but make it the standard default configuration.

overclocked · Oct 9, 2005

nAo said:
Ok.. but this not so interesting, I mean..we're interested in real world performances and even if we don't have much info about nv40/g70 fragment processors architecture it seems dynamic branching performance is not that good, and all we had/have to explain this lack of performance is the big batches story.
It would be nice to have some additional detail..

For a "minor" refresh" could they just drop the batch-size to 16*32 or 16*16 if this is indeed where the bottleneck is with the current architecture?

And the NV40Â´s had a batch-size of 4096 and G70 1024, is this correct understood?

Dynamic branching in 3dmarkxx ?

trinibwoy

Meh

Geo

Mostly Harmless

DemoCoder

Graham

Hello :-)

pat777

Nick

Tim

Geo

Mostly Harmless

Nick

trinibwoy

Meh

Dave Baumann

Gamerscore Wh...

Bob

Dave Baumann

Gamerscore Wh...

Bob

Jawed

nAo

Nutella Nutellae

Bob

nAo

Nutella Nutellae

Acert93

Artist formerly known as Acert93

overclocked

Similar threads