Running the Toy Shop demo on the 360

Jaws – when was a MADD equated to being the definition of a FLOP? If the scalar units are doing any ops single cycle then that doesn’t equate.

Bill – the efficiency claims for R520 are for the Pixel Shader only. It’s a non-unified architecture hence it can suffer from the same load balance issues as another other non-unified architecture; Xenos’s claims are across the entire operation of the chip.

Azrael – one of the reasons for the performance differences could be due to how the two parts handle dynamic branching. Xenos does have more shader power, but it has a slightly lower granularity in branching (64 pixels to R520’s 16).
 
when comparing Shader Engine's Math Peak Values you don't have to forget the relative efficiency

Old Tech Static Shader (G70,R480 and so on) 40-70% depending on the situation
New Tech Static Shader (R520) 95%
New Tech Unified Shaders (Xenos) 95%
 
Hardknock said:
And here's why this news is so important:

http://www.beyond3d.com/forum/showthread.php?t=24254&page=4

This demo would not run at a playable framerate on the G70 series of cards. The R520 runs it 2x to 3x faster than a G70 depending on the situation! And Xenos is even more powerful than that!!
As has been explained, this 2-3x performance is only in branch-heavy shaders. Can any give a rough idea how Xenos' shader branching is implemented compared to R520? Is it likely to handle branch-heavy shaders as well?
 
Shifty - try reading a few posts up.

BTW - although there is a difference between Xenos and R520 bear in mind that branching testing indicates G70 is in the order of 1024 pixel batch/branch sizes for normal branching operations.
 
Dave Baumann said:
Jaws – when was a MADD equated to being the definition of a FLOP? If the scalar units are doing any ops single cycle then that doesn’t equate.
...

For as long as I remember. E.g. a PowerPC G5 has a MADD capable FPU, so multiply-add counts as 2 Flops, MULS and ADDs as 1 Flops each etc. Or the VMX unit being vec4 and madd capable counts as 8 Flops. Or a VS unit is 10 Flops, 8 from vec4 and 2 from scalar madds.

i.e.
with madds,

vec4 ~ 8, vec3 ~ 6, vec2 ~ 4, scalar ~ 2

and without,

vec4 ~ 4, vec3 ~ 3, vec2 ~ 2, scalar ~ 1
 
Last edited by a moderator:
SynapticSignal said:
when comparing Shader Engine's Math Peak Values you don't have to forget the relative efficiency

Old Tech Static Shader (G70,R480 and so on) 40-70% depending on the situation
New Tech Static Shader (R520) 95%
New Tech Unified Shaders (Xenos) 95%

Those are ATI PR numbers that apply to ATI products.
 
I don't see anything particulary suprising here but its nice to see it clarified from a reliable source.

Xenos and R520 are roughly even in terms of overall performance, but both have slight advantages over the other in different areas (Xenos = shaders, R520 = "DX7 type" raw pixel pushing power)

Xenos obviously has the feature advantage although just how great (and relevant) that is is something im still unsure of.

Xenos would clearly be inferior in terms of shader performance (and hence just about everything else) to a pair of R520's in Xfire.

If Xenos has a "slight" shader power advantage over R520, then R580 is it sticks true to the 16:1:3:1 speculation should crush it in that area.
 
Jaws said:
Those are ATI PR numbers that apply to ATI products.

no the efficiency of 40-70% of old static shaders are numbers that MS-haters and ati-haters guys don't want to believe


Jaws you seem truly biased, is that a my impression or what?
 
Jawed said:
http://www.beyond3d.com/forum/showpost.php?p=294006&postcount=99

NV40 averaging 2.2 instructions per clock. Far below its peak capability of 5 instructions per clock.

Note the 35% penalty for running FP32, 1.6 instructions per clock.

It'd be interesting to see how G70 deals with this. I expect the penalty for FP32 execution would be much the same, since the register bandwidth problem isn't fixed in G70.

Jawed

The NV40 would be 4 IPC, IIRC, as it doesn't have the 5th, 16bit normalise instruction like the G70 but it would be interesting to see the differences in efficiency with some tests.

You've probably seen this G70vsR520 article,

http://www.driverheaven.net/articles/efficiency/testing2.htm

Though I don't necessarily agree with it's conclusion, it does show that the G70 IPC isn't exactly going to waste...
 
SynapticSignal said:
no the efficiency of 40-70% of old static shaders are numbers that MS-haters and ati-haters guys don't want to believe

Sorry, but I don't buy efficiency numbers pulled out of a hat without any detailed explanation. That applies to Nvidia, ATI or anyone.

SynapticSignal said:
Jaws you seem truly biased, is that a my impression or what?

Sorry, your impression are wrong. Quit with the troll post, especially when you've been here ONE day!
 
I don't understand why you continually quote peak FLOP numbers when you are obviously smart enough to know full well that they don't have any real relationship to real-world performance.

From what I understand ATI's chips regularly have lower FLOP counts then NVidia's yet are still able to compete and sometimes surpass their chips.

Shouldn't this thread should be about realworld benchmarks between the 1800XT and a overclocked G70 with simulated 128bit bus? Not peak FLOP numbers....
 
scooby_dooby said:
Shouldn't this thread should be about realworld benchmarks between the 1800XT and a overclocked G70 with simulated 128bit bus? Not peak FLOP numbers....

Only if it were moved to the PC forums.

Xenos is not an 1800XT. There are some major architectural differences that can make for some really drastic performance differences, and 1800XT benchmarks have no relevence on Xenos performance.
 
scooby_dooby said:
I don't understand why you continually quote peak FLOP numbers when you are obviously smart enough to know full well that they don't have any real relationship to real-world performance.

From what I understand ATI's chips regularly have lower FLOP counts then NVidia's yet are still able to compete and sometimes surpass their chips.

Shouldn't this thread should be about realworld benchmarks between the 1800XT and a overclocked G70 with simulated 128bit bus? Not peak FLOP numbers....

Well if you read what I posted then you'd realise that's exactly why I posted the driverheaven link above as there is correlation with benchmarks.
 
> "Unified shaders in itself doesn't inherently mean more powerful."

I could not agree more. There is so much complexity to modern day rendering, that just because you unified your vertex and pixel shaders does not automatically mean you run faster.
 
Powderkeg said:
Only if it were moved to the PC forums.

Xenos is not an 1800XT. There are some major architectural differences that can make for some really drastic performance differences, and 1800XT benchmarks have no relevence on Xenos performance.
Xenos has more shader power than 1800XT, ATI has said so. RSX is a modified G70, by the numbers we've seen it looks to be overclocked to 550mhz. Therefore, by comparing 1800XT to an overclocked G70 we can get some idea of the capabilities of Xenos compared to RSX.

Not only that, but this should give us insight into how well the 48 USA's compare to conventional pipes as far as shading power no?
 
Last edited by a moderator:
Jaws said:
Sorry, but I don't buy efficiency numbers pulled out of a hat without any detailed explanation. That applies to Nvidia, ATI or anyone.

to understand the numbers, the logic can be enought
in a classic SSA in situation where the pixel computing is the bound, vertex computing stall
in situation where the vertex computing is the bound, pixel computing stall

the hit can go from a 30% to a 70% of units and time depending of the stalls of the shaders

ex. with a SSA [6 VS, 24 PS] if the 6 VS are not enoght to process data, the 24 PS can stall until the VS finish their work (70% hit)

with a USA [30 US] if 6 VS are not enought, 6 or 18 or all 24 remaining Shaders will be added to the first 6, so there's no stall, full efficiency (near 100%, with 0-5% to decide how many shaders to convert to vertex computing)

it's so EASY to understand

Sorry, your impression are wrong. Quit with the troll post, especially when you've been here ONE day!

so, here you're the king of the truth just because you have tot-post and because I post here from today?
if this is your logic, the I know why you don't understand the USA

and if one say that you seems very biased, he is trolling?

please grow up, and stop here the discuss, I'm not interested in personal attack or "words fight"
 
Jawed said:
Wrong.

http://download.nvidia.com/developer/GPU_Gems_2/GPU_Gems2_ch30.pdf

17th page of the PDF. But I was counting the RCP, not the FP16 NRM, anyway ;)

Well that's fine, precisely why I stated "IIRC", as it was my understanding that it was a G70 introduction. Though my point was 4 32bit IPC and not 5, peak, which still stands.

Jawed said:
And I pointed out the 35% performance hit (in this shader) of FP32 quite deliberately. You can't count FP32 capabilities as though they are equivalent to FP16 capabilities.

Jawed

I wasn't. It's still a total of 4 FP32 IPC, peak, not 5.
 
Back
Top