How does one 7900 vertex shader compare to a Xenos alu in terms of sops?

MBDF · Apr 29, 2006

To do equivilant general vertex work how many alu's must be used to equal 8 vertex shaders? 8? 16?

Mintmaster · Apr 30, 2006

I think the general consensus is that Xenos' ALU structure is similar to that of ATI's vertex shader ALU in their other cards (R3xx-R5xx), which in turn performs similarly to NVidia's vertex shader (NV4x/G7x), which in turn is similar to RSX. So I think they're about equal.

Now I have a feeling you'll conclude that Xenos has 40 ALU's left for pixels shading when comparing with RSX, as many other people seem to do, but that's not the right way of looking at things.

Vertex work is very "clumpy". At any given instant on non-unified architectures, you're generally either vertex limited or pixel limited, and you rarely have both working anywhere near their peak at the same time. Lots of the time you're transforming vertices that yeild no pixels at all, so your pixel shaders are waiting for a vertex with a decent number of pixels in it. With big triangles it takes many cycles to do all the pixels, so your vertex shader is waiting around because there's no room to put any more transformed vertices.

The main reason this doesn't bother the IHV's is that vertex shaders are compact and cheap since they don't texture (or if they do, they're not built to do it fast). If they sit idle, no big deal.

MBDF · Apr 30, 2006

Thanks... Are there any downsides to Xenos's unified architecture... or are they minimal?

Mintmaster · Apr 30, 2006

Downsides compared to what?

From what we've heard a unified design takes up more die space than one that isn't unified. The performance could be better, worse, or the same, but it depends on what the workload is. Consequently, performance per mm2 of die size is just as much in the air.

So your question is essentially unanswerable.

MBDF · Apr 30, 2006

Mintmaster said:
Downsides compared to what?

From what we've heard a unified design takes up more die space than one that isn't unified. The performance could be better, worse, or the same, but it depends on what the workload is. Consequently, performance per mm2 of die size is just as much in the air.

So your question is essentially unanswerable.

What are the downsides compared to the traditional design, dedicated vertex and pixel shaders?

Shifty Geezer · Apr 30, 2006

nVidia would say that performance of the US isn't as effective as specialised pipes, but there's no confirmation on that yet. The main downside that I know of is consumption of transistors on US management hardware for scheduling tasks etc., which could be used in processing hardware instead or adding features. nVidia have also said they've optimized their pixel pipes to run the most common shader programs, which could give them an advantage over generalized pipes. Don't know what those optimizations are though or how effective they are.

Personally, from a theoretical POV, US looks a smarter solution and with graphics cards heading in that direction, whatever advantages fixed function units have are pretty negligable to the benefits of not having so much idle hardware, particularly in the PC space where the games can't be balanced over a specific GPU configuration.

MBDF · Apr 30, 2006

Thanks for the info.

Edge · Apr 30, 2006

Shifty Geezer said:
US looks a smarter solution

Smarter in terms of certain workloads, but not smarter interms of die size, and thus cost. Unified shaders are only now coming into existence because the technology exists to create those huge chips.

Dave Baumann · Apr 30, 2006

Mintmaster said:
From what we've heard a unified design takes up more die space than one that isn't unified.

Except that doesn't really mesh neatly with the notion of implementing them in very die size/power/performance critical implementations such as handheld devices, yet PowerVR SGX is unified and all indications are that ATI is taking a Xenos like architecture to handhelds this or next year as well. The strongest proponent of this line of argumentation is the company that doesn't yet have a unified design...

ERP · Apr 30, 2006

MBDF said:
Thanks... Are there any downsides to Xenos's unified architecture... or are they minimal?

There are downsides, it's hard to guage exactly what the costs of them are, because your can only really measure in the context of the implementation.

Basically you have to make a choice and what is better for Vertex shaders may not be better for pixel shaders. NV currentlt use MIMD vertex shaders and SIMD pixel shaders for example and they would have to unify this in a unified design.

Xenos is a lot like using pixel shaders for everything, because the Xenos batch size is so small anyway, it's not a huge penalty. NV has by comparison extremly large batch sizes in their current architecture, that makes likely makes their pixels shaders an extremely poor candidate for general vertex shading.

The entire frontend design of Xenos is very different than current NV and I assume earlier ATI chips, some things like VTF are cheap because of the unified design, others are potentially expensive.

A unified design is a trade off like any other. Clearly ATI think it's a good one in the relatively short term because they've stated that R600 is using a lot of the Xenos technology. NV obviously currently don't thing so.

GB123 · Apr 30, 2006

ERP said:
NV obviously currently don't thing so.

Could that be because they aren't in a position to follow ATI because of the reasons you mentioned above with large batch sizes.

It seems to me that if Nvidia were to go this route they would have to change so much of their design that old software would run poorly or that they would have to compensate with more shader units than ATI, thus increasing die size and overal cost to get the same kind od preformance.

Maybe i misunderstood what you were saying, or what you didn't say.

How does the Xenos batch size compare to other ATI GPU's ?

Edge · Apr 30, 2006

Dave Baumann said:
Except that doesn't really mesh neatly with the notion of implementing them in very die size/power/performance critical implementations such as handheld devices, yet PowerVR SGX is unified and all indications are that ATI is taking a Xenos like architecture to handhelds this or next year as well. The strongest proponent of this line of argumentation is the company that doesn't yet have a unified design...

What is the die size for the performance of those parts? You can hardly claims the benefits of something that does not exist yet, and cannot be used for comparison purposes. Saying they are going to use unified parts for that sector is not good enough, if performance is lacking due to a lower number of execution units, and corresponding data lines and associated registers.

While we are at it, how can anyone claim the superiority of Xenos, if no benchmarking metrics, or even game to game comparisons can accurately be made in the console sector? Yes, I realize that's what's being discussed here, but the end result from what I see is similar performance, with each part having different strength attributes for different circumstances.

Xenos is hardly a huge win because of unified shaders over a discrete part like Nvidia's 7900 series, and we all know that the 7900 series is meeting excellent die size and power issue requirements for the console space.

GB123 · Apr 30, 2006

Edge said:
What is the die size for the performance of those parts? You can hardly claims the benefits of something that does not exist yet, and cannot be used for comparison purposes. Saying they are going to use unified parts for that sector is not good enough, if performance is lacking due to a lower number of execution units, and corresponding data lines and associated registers.

While we are at it, how can anyone claim the superiority of Xenos, if no benchmarking metrics, or even game to game comparisons can accurately be made in the console sector? Yes, I realize that's what's being discussed here, but the end result from what I see is similar performance, with each part having different strength attributes for different circumstances.

Xenos is hardly a huge win because of unified shaders over a discrete part like Nvidia's 7900 series, and we all know that the 7900 series is meeting excellent die size and power issue requirements for the console space.

Nobody is claiming one is better than the other, it's a matter of which is more flexable.

Carl B · Apr 30, 2006

Dave Baumann said:
Except that doesn't really mesh neatly with the notion of implementing them in very die size/power/performance critical implementations such as handheld devices, yet PowerVR SGX is unified and all indications are that ATI is taking a Xenos like architecture to handhelds this or next year as well. The strongest proponent of this line of argumentation is the company that doesn't yet have a unified design...

Dave I hear what you're saying here and it's completely logical to point these facts out, but at the same time it kind of puts a white elephant in the room in that ironically there're probably few people more qualified to answer the question indirectly posed than you yourself.

And I mean if there are NDA issues that prevent you from answering I understand, but I just have to ask: what's your own estimate on the transistor budget allocated on Xenos to control/management logic? SGX and R600 and the rest of it aside, I imagine you must have a sense of what these transistor allocations are within Xenos.

Edge · Apr 30, 2006

GB123 said:
Nobody is claiming one is better than the other, it's a matter of which is more flexable.

Well the part that is the most flexible is the better one. You want a GPU that lends itself to providing the highest frame-rate depending on the mix of vertex, texture and pixel shader ops.

Dave Baumann · Apr 30, 2006

Edge said:
What is the die size for the performance of those parts? You can hardly claims the benefits of something that does not exist yet, and cannot be used for comparison purposes. Saying they are going to use unified parts for that sector is not good enough, if performance is lacking due to a lower number of execution units, and corresponding data lines and associated registers.

These parts are low on execution units because of the die sizes/costs/power metrics they have to eat, ergo its completely counter productive to waste transistors on control if you could just end up with more units in place of those extra controls required. To make any sense in this market the cost of implementing the unified architecture has to provide more benefit than it takes away - and its hardly as though this market is crying out for complex shader architectures yet.

xbdestroya said:
Dave I hear what you're saying here and it's completely logical to point these facts out, but at the same time it kind of puts a white elephant in the room in that ironically there're probably few people more qualified to answer the question indirectly posed than you yourself.

And I mean if there are NDA issues that prevent you from answering I understand, but I just have to ask: what's your own estimate on the transistor budget allocated on Xenos to control/management logic? SGX and R600 and the rest of it aside, I imagine you must have a sense of what these transistor allocations are within Xenos.

Its impossible to estimate these things. Not only that, but your getting things fed through for marketing that obviously has a particular agenda.

However, the more I look at it the more I believe the notion of actual unification is secondardy in terms of costs - it pretty much does the same things as a traditional architecture and it follows the same path, except that the the shader elements move to the same hardware element but then diverge again when they pop out the back. Whats more important with an architecture like Xenos is actually the command control - i.e. batch handling/juggling/sizes. Xenos, here bears many similarities with R520/580's architecture. Its impossible to tell if there is that much difference between a unified shader architecture forbeing unified, or just having that level of batch handling capabilities.

One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not.

Carl B · Apr 30, 2006

Dave Baumann said:
Its impossible to estimate these things. Not only that, but your getting things fed through for marketing that obviously has a particular agenda.

However, the more I look at it the more I believe the notion of actual unification is secondardy in terms of costs - it pretty much does the same things as a traditional architecture and it follows the same path, except that the the shader elements move to the same hardware element but then diverge again when they pop out the back. Whats more important with an architecture like Xenos is actually the command control - i.e. batch handling/juggling/sizes. Xenos, here bears many similarities with R520/580's architecture. Its impossible to tell if there is that much difference between a unified shader architecture forbeing unified, or just having that level of batch handling capabilities.

One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not.

The bolded portion of your reply was more what I was speaking to with my comment on 'control/management' logic; wondering how you thought the dispatch logic might compare transistor-wise to something like the R580. But now that I think of it knowing explicitly R580's situation wouldn't really give any direct insights into Xenos anyway since at the end of the day, they're still more different than they are similar.

Thanks for the answer though, and that last comment of yours is very interesting. Raises a number of questions itself, and clearly DX10 or no, implies there must be a faction within ATI that feels the future is now for unified.

Dave Baumann · Apr 30, 2006

xbdestroya said:
The bolded portion of your reply was more what I was speaking to with my comment on 'control/management' logic; wondering how you thought the dispatch logic might compare transistor-wise to something like the R580. But now that I think of it knowing explicitly R580's situation wouldn't really give any direct insights into Xenos since at the end of the day, they're still more different than they are similar.

No, I think they are more similar than they are different - they both handle many threads in flight at any point in time that can either be executed or slept dependant on whether data is ready, in order to (a.) handle latencies well whilst still (b.) providing low enough granularity to allow for good dynamic branching and small triangle sizes (and not impact vertex performance much in the case of Xenos).

Carl B · Apr 30, 2006

Dave Baumann said:
No, I think they are more similar than they are different - they both handle many threads in flight at any point in time that can either be executed or slept dependant on whether data is ready, in order to (a.) handle latencies well whilst still (b.) providing low enough granularity to allow for good dynamic branching and small triangle sizes (and not impact vertex performance much in the case of Xenos).

That almost leads me back to my original question then, but if we don't know the transistor cost for dispatch we just don't know. I see what you're saying though with the thread and 'array' similarities between Xenos and R580. I guess quantifying what the 'cost' is of going unified transistor-wise needs to be assessed across a number of chip aspects. You have to understand when I first asked though, I was just wondering if you flat out knew, not because I felt we could unravel the puzzle here ourselves. In that context, your previous post on the SGX lends the more insight; I was just worried you were being cagey!

Dave Baumann · May 1, 2006

xbdestroya said:
That almost leads me back to my original question then, but if we don't know the transistor cost for dispatch we just don't know.

Again, I wonder if the actual "unified" control element is that costly at all - in fact, with a unified architecture, rather than command processors for both Pixel Shaders and Vertex Shaders, there is a single command processor that covers both shader types in a unfied architecture. Xenos has control elements per shader array, but then R580 has control elements for each of its 4 arrays of 12 pixel shaders.[/quote]

In that context, your previous post on the SGX lends the more insight

Of interest, PowerVR's site indicates that the lowest performance version of SGX can fit into a 90nm die size of less than 2x2mm! Thats obviously not any kind of comparison as it got far less in there and it will have smaller control elements because it has less to control.

How does one 7900 vertex shader compare to a Xenos alu in terms of sops?

MBDF

Mintmaster

MBDF

Mintmaster

MBDF

Shifty Geezer

uber-Troll!

MBDF

Edge

Dave Baumann

Gamerscore Wh...

ERP

GB123

Edge

GB123

Carl B

Friends call me xbd

Edge

Dave Baumann

Gamerscore Wh...

Carl B

Friends call me xbd

Dave Baumann

Gamerscore Wh...

Carl B

Friends call me xbd

Dave Baumann

Gamerscore Wh...

Similar threads