Ati on Xenos

Shifty Geezer said:
What percentage of current GPU's is sitting around idle on average at the moment then?

nVidia have claimed they've looked at unified shaders but the lack in effeciency compared to customised shaders they felt meant it couldn't outperform a conventional architecture. I find that hard to swallow unless unifed shaders are about x% slower than customised shaders where x% is the percentage that shaders are idle on conventional GPUs.

I presume your Xenos article will hit on this.


I thought the inefficiencies ATi talk about (for current PC cards) stem from having VS being idle while the PS are doing a bunch of work or vice versa. As a whole it's inefficient, but if you compare customized PS pipe to the unified pipe, the PS pipe is more efficient.
 
Yes, I appreciate that. ATI's case is while shaders are sat doing nothing, they're ineffecient. Let's say only 60% of the GPU is in use on average, 40% potential processing time being wasted.

For a unified shader system where all ALUs are working at the same time, 100% of the time (good as damn it), for this system to be less capable then conventional fixed funtion pipes, these shaders would have to run at less than 60% of the performance of the fixed pipes.

Are unified shaders that slow? If not, why is nVidia not pursuing a unified shader architecture, saying they've evaluated it and seen it doesn't deliver? Did they just miss the trick, and implement a poor unified shader model that severely retards performance? :?

I'm trying to understand why nVidia haven't gone with unified shaders. It sounds like a smart system to me. Maybe the load balancing is difficult and fails to maintain that theoretical 100% load?
 
Shifty Geezer said:
I'm trying to understand why nVidia haven't gone with unified shaders. It sounds like a smart system to me. Maybe the load balancing is difficult and fails to maintain that theoretical 100% load?
ATi zigs; nVidia zags.
 
I'm trying to understand why nVidia haven't gone with unified shaders. It sounds like a smart system to me. Maybe the load balancing is difficult and fails to maintain that theoretical 100% load?

It costs transistors, transistors you could be using for more pixel pipelines or vertex shaders. NVidia also argues that the workload done by a vertex shader is significantly different than that done by a pixel shader, and that you can optimise for the loads better with segmented shaders.

IME GPU's spend a large portion of the time with either their vertex shaders or pixel shaders idle. I'm really intrigued to see what sort of relative performance a unified architecture has.

Since we don't have useful benchmarks from parts to compare with unified shaders we'll have to wait and see who's right.

In the mean time ATI is going to claim it's the bestest thing eva and NVidia is going to downplay it because they don't have it.
 
Inane_Dork said:
Shifty Geezer said:
I'm trying to understand why nVidia haven't gone with unified shaders. It sounds like a smart system to me. Maybe the load balancing is difficult and fails to maintain that theoretical 100% load?
ATi zigs; nVidia zags.

"In our previous Differing Philosophies Emerge Between ATI and NVIDIA report we looked at comments from NVIDIA’s Chief Scientist, David Kirk, that mentioned his dislike of the idea of a unified Shader pipeline at the hardware level due to the differing demands and workloads between Pixel Shaders and Vertex Shaders. In the commentary David Kirk specifically singled out texturing as an example of the differing pipelines, and in replying to these comments ATI’s Eric Demers agreed that there are different demands on the pipelines but suggested that "if one were able to figure out a way to unify the shaders such that nothing extra is required to fulfil all the requirements of the different shaders, while being able to share all their commonality, that would be a great solution." "

^ I think ATI got to the patent before Nvidia did
 
DaveBaumann said:
Jaws said:
So I can't see the 48 ALU clusters (48 Vec4 + 48 Scalar) ALL working on fragments or vertices per cycle. Unless I've missed something, 32 ALUs, peak, would work on fragments and 16 ALUs on vertices and vice versa...

Why not? There will be occasions where it will be working soley on pixels, occasions where it'll be working on both, and occasions where it'll only be working on vertices (easy example of the last one is a Z only render pass - all 48 ALU's will be calculating the geometry in order to populate the Z buffer).

Okay, I think you've answered one of my queries which was how far 'ahead' does the auto-load balancing mechanism work.

My assumption was that it looks at it on a 'per cycle' basis (hence the 32 fragments per cycle) and distributes the workload between the 3 SIMD engines. The 'sweet spot' would be 2 SIMD engines on fragments and 1 on vertices per cycle because vertices/fragments are dependent. Because of this dependency, they don't 'outstrip' each other.

The auto-load balancing actually looks further ahead to a per frame basis or beyond that can be user defined?
 
ERP said:
It costs transistors, transistors you could be using for more pixel pipelines or vertex shaders. NVidia also argues that the workload done by a vertex shader is significantly different than that done by a pixel shader, and that you can optimise for the loads better with segmented shaders.

IME GPU's spend a large portion of the time with either their vertex shaders or pixel shaders idle. I'm really intrigued to see what sort of relative performance a unified architecture has.

It's the entire transistor count situation that actually fascinates me about this whole thing. According to ATI, R500 is the 'equivelent' of a conventional 32-pipe card; yet dumping the edram it's transistor count hovers barely above ~200 million. The RSX on thge other hand with it's supposedly ~300 million transistors, if going by the conventional wisdom, is just a modified conventional 24-pipe chip. Even if we take the R500's pipe analogy to include vertex shaders, that still puts it at roughly the same 24-pipes as G70/RSX, but at a 33% transistor discount.

So either unified shaders are in fact the lord of transistor efficiency, something special is going on inside RSX, or R500 has some drawback we're not aware of yet.

Something has to give though, because the discrepency in transistors for the actual GPU's just seems too pronounced.
 
Coola said:
Inane_Dork said:
Shifty Geezer said:
I'm trying to understand why nVidia haven't gone with unified shaders. It sounds like a smart system to me. Maybe the load balancing is difficult and fails to maintain that theoretical 100% load?
ATi zigs; nVidia zags.

"In our previous Differing Philosophies Emerge Between ATI and NVIDIA report we looked at comments from NVIDIA’s Chief Scientist, David Kirk, that mentioned his dislike of the idea of a unified Shader pipeline at the hardware level due to the differing demands and workloads between Pixel Shaders and Vertex Shaders. In the commentary David Kirk specifically singled out texturing as an example of the differing pipelines, and in replying to these comments ATI’s Eric Demers agreed that there are different demands on the pipelines but suggested that "if one were able to figure out a way to unify the shaders such that nothing extra is required to fulfil all the requirements of the different shaders, while being able to share all their commonality, that would be a great solution." "

^ I think ATI got to the patent before Nvidia did

http://www.beyond3d.com/forum/viewtopic.php?p=491586#491586

System and method for reserving and managing memory spaces in a memory resource

NVIDIA also have similar types of patents that can be adapted to load-balancing threading systems with or without unified execution units...
 
xbdestroya said:
ERP said:
It costs transistors, transistors you could be using for more pixel pipelines or vertex shaders. NVidia also argues that the workload done by a vertex shader is significantly different than that done by a pixel shader, and that you can optimise for the loads better with segmented shaders.

IME GPU's spend a large portion of the time with either their vertex shaders or pixel shaders idle. I'm really intrigued to see what sort of relative performance a unified architecture has.

It's the entire transistor count situation that actually fascinates me about this whole thing. According to ATI, R500 is the 'equivelent' of a conventional 32-pipe card; yet dumping the edram it's transistor count hovers barely above ~200 million. The RSX on thge other hand with it's supposedly ~300 million transistors, if going by the conventional wisdom, is just a modified conventional 24-pipe chip. Even if we take the R500's pipe analogy to include vertex shaders, that still puts it at roughly the same 24-pipes as G70/RSX, but at a 33% transistor discount.

So either unified shaders are in fact the lord of transistor efficiency, something special is going on inside RSX, or R500 has some drawback we're not aware of yet.

Something has to give though, because the discrepency in transistors for the actual GPU's just seems too pronounced.


You have to count the blending and AA logic in the RAM if you want to compare transistor counts.

Personally I'd ignore all the X pipe talk it's all just marketing crap.
 
ERP said:
xbdestroya said:
ERP said:
It costs transistors, transistors you could be using for more pixel pipelines or vertex shaders. NVidia also argues that the workload done by a vertex shader is significantly different than that done by a pixel shader, and that you can optimise for the loads better with segmented shaders.

IME GPU's spend a large portion of the time with either their vertex shaders or pixel shaders idle. I'm really intrigued to see what sort of relative performance a unified architecture has.

It's the entire transistor count situation that actually fascinates me about this whole thing. According to ATI, R500 is the 'equivelent' of a conventional 32-pipe card; yet dumping the edram it's transistor count hovers barely above ~200 million. The RSX on thge other hand with it's supposedly ~300 million transistors, if going by the conventional wisdom, is just a modified conventional 24-pipe chip. Even if we take the R500's pipe analogy to include vertex shaders, that still puts it at roughly the same 24-pipes as G70/RSX, but at a 33% transistor discount.

So either unified shaders are in fact the lord of transistor efficiency, something special is going on inside RSX, or R500 has some drawback we're not aware of yet.

Something has to give though, because the discrepency in transistors for the actual GPU's just seems too pronounced.


You have to count the blending and AA logic in the RAM if you want to compare transistor counts.

Personally I'd ignore all the X pipe talk it's all just marketing crap.

RSX ~ 300 million transitors, 550 MHz.

Xenos ~ 332 mil, 500 MHz.
- 232 mil, Shader module.
- 100 mil, EDRAM module. ( 80 mil 10 MB eDRAM)

Excluding eDRAM transistors,

Xenos ~ 252 mil, RSX ~ 300 mil.
 
So either unified shaders are in fact the lord of transistor efficiency, something special is going on inside RSX, or R500 has some drawback we're not aware of yet.
There are things other then shader ALUs taking up transistors.
 
The r500 doesn't support 32fp hdr so there can be some transitor diffrences there . Dunno how much of course . I also don't believe the r500 has the encoding stuff in it like the g70 (unless they removed it for being brokein in the nv40s)
 
Fafalada said:
So either unified shaders are in fact the lord of transistor efficiency, something special is going on inside RSX, or R500 has some drawback we're not aware of yet.
There are things other the shader ALUs taking up transistors.

Sure, but that's a lot of transistors, you must agree.

Either way I guess we'll have more of a sense of what's going on when G70 comes out and we gain insight into how many transistors it has. Obviously a large or minor discrepency between the two would adjust speculation accordingly.
 
ERP said:
Since we don't have useful benchmarks from parts to compare with unified shaders we'll have to wait and see who's right.

In the mean time ATI is going to claim it's the bestest thing eva and NVidia is going to downplay it because they don't have it.
We don't have benchmarks, but presumably nVidia does. At least, research estimations suggesting more pipes>unified pipes.

So maybe the unified shaders ARE much slower than the specialised ones? Or maybe ATi figured it out when nVidia couldn't? Or even simple maths... 36 fixed pipes vs. 24 unified pipes. At 60% efficiency, 24 fixed pipes are active, and the technology's known and easier to work with?

How long to Dave's Xenos article? :D
 
DaveBaumann said:
The real question is "why" does it have a lot of new things. If they didn't percieve a fundamental issue with the current pipeline then why make as radical a switch as they have?

I wasn't questioning the reasoning behind having so many "new things", just pointing out that the resource requirement with so many "new things" is likely higher.

I'm now a little more confused than ever about the "granularity" of the architecture, so to speak :LOL:

Re. "do per ALU performance/efficiency losses outweigh efficiency gains elsewhere", I'm not sure if utilisation gains on a higher level and efficiency losses on a per unit level are directly comparable. Aka 60% of a loss on one level balanced out by 60% of a gain on another. It's probably more complex than that? But fundamentally I think it's fair to characterise the whole thing as a tradeoff of efficiency/performance on one level for flexibility/utilisation on another.
 
xbdestroya said:
Sure, but that's a lot of transistors, you must agree.
Well my point is you don't know how much transistors is used for shader ALUs in either chip. And sure it's a sizeable difference, but there's very little we know about the chip so far. Just an example, what if RSX comes with a particularly large cache - 1MB of SRAM with asociated logic could run you over 80M transistors (just an example, I'm not saying this is even remotely likely to happen).

jvd said:
I also don't believe the r500 has the encoding stuff in it like
I would think neither does RSX, unless Sony&NVidia had monkeys working on the chip.
 
Shifty Geezer said:
We don't have benchmarks, but presumably nVidia does. At least, research estimations suggesting more pipes>unified pipes.
And ATi's research must have indicated the opposite.

We could keep going around this merry-go-round for some time. :p
 
Exactly. So how come they came to different opnions? I can only conclude the actual difference in the performance of the architectures doesn't amount to much!
 
I would think neither does RSX, unless Sony&NVidia had monkeys working on the chip.

Do the people who made the geforce fx count ? ;) anyway i don't know how heavly modified the rsx will be so it may have thigns that are redundant because they were intergrated into the chip and couldn't be umm unintergrated.
 
Back
Top