If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Junior Member
|
To do equivilant general vertex work how many alu's must be used to equal 8 vertex shaders? 8? 16?
|
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Mar 2002
Posts: 3,786
|
I think the general consensus is that Xenos' ALU structure is similar to that of ATI's vertex shader ALU in their other cards (R3xx-R5xx), which in turn performs similarly to NVidia's vertex shader (NV4x/G7x), which in turn is similar to RSX. So I think they're about equal.
Now I have a feeling you'll conclude that Xenos has 40 ALU's left for pixels shading when comparing with RSX, as many other people seem to do, but that's not the right way of looking at things. Vertex work is very "clumpy". At any given instant on non-unified architectures, you're generally either vertex limited or pixel limited, and you rarely have both working anywhere near their peak at the same time. Lots of the time you're transforming vertices that yeild no pixels at all, so your pixel shaders are waiting for a vertex with a decent number of pixels in it. With big triangles it takes many cycles to do all the pixels, so your vertex shader is waiting around because there's no room to put any more transformed vertices. The main reason this doesn't bother the IHV's is that vertex shaders are compact and cheap since they don't texture (or if they do, they're not built to do it fast). If they sit idle, no big deal. |
|
|
|
|
|
#3 |
|
Junior Member
|
Thanks... Are there any downsides to Xenos's unified architecture... or are they minimal?
|
|
|
|
|
|
#4 |
|
Senior Member
Join Date: Mar 2002
Posts: 3,786
|
Downsides compared to what?
From what we've heard a unified design takes up more die space than one that isn't unified. The performance could be better, worse, or the same, but it depends on what the workload is. Consequently, performance per mm2 of die size is just as much in the air. So your question is essentially unanswerable. |
|
|
|
|
|
#5 | |
|
Junior Member
|
Quote:
|
|
|
|
|
|
|
#6 |
|
uber-Troll!
Join Date: Dec 2004
Location: Under my bridge
Posts: 26,455
|
nVidia would say that performance of the US isn't as effective as specialised pipes, but there's no confirmation on that yet. The main downside that I know of is consumption of transistors on US management hardware for scheduling tasks etc., which could be used in processing hardware instead or adding features. nVidia have also said they've optimized their pixel pipes to run the most common shader programs, which could give them an advantage over generalized pipes. Don't know what those optimizations are though or how effective they are.
Personally, from a theoretical POV, US looks a smarter solution and with graphics cards heading in that direction, whatever advantages fixed function units have are pretty negligable to the benefits of not having so much idle hardware, particularly in the PC space where the games can't be balanced over a specific GPU configuration.
__________________
Shifty Geezer ... Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents. |
|
|
|
|
|
#7 |
|
Junior Member
|
Thanks for the info.
|
|
|
|
|
|
#8 | |
|
Join Date: Apr 2002
Posts: 613
|
Quote:
__________________
SN Systems (Middleware): "Since these different parts (SPEs) can all access their own memory at full speed simultaneously, it should give the PS3 a significant performance advantage." Edge Magazine August 2005 issue |
|
|
|
|
|
|
#9 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,989
|
Quote:
|
|
|
|
|
|
|
#10 | |
|
Moderator
Join Date: Feb 2002
Location: Redmond, WA
Posts: 3,322
|
Quote:
Basically you have to make a choice and what is better for Vertex shaders may not be better for pixel shaders. NV currentlt use MIMD vertex shaders and SIMD pixel shaders for example and they would have to unify this in a unified design. Xenos is a lot like using pixel shaders for everything, because the Xenos batch size is so small anyway, it's not a huge penalty. NV has by comparison extremly large batch sizes in their current architecture, that makes likely makes their pixels shaders an extremely poor candidate for general vertex shading. The entire frontend design of Xenos is very different than current NV and I assume earlier ATI chips, some things like VTF are cheap because of the unified design, others are potentially expensive. A unified design is a trade off like any other. Clearly ATI think it's a good one in the relatively short term because they've stated that R600 is using a lot of the Xenos technology. NV obviously currently don't thing so. |
|
|
|
|
|
|
#11 | |
|
Member
Join Date: Aug 2005
Posts: 214
|
Quote:
It seems to me that if Nvidia were to go this route they would have to change so much of their design that old software would run poorly or that they would have to compensate with more shader units than ATI, thus increasing die size and overal cost to get the same kind od preformance. Maybe i misunderstood what you were saying, or what you didn't say. How does the Xenos batch size compare to other ATI GPU's ? Last edited by GB123; 30-Apr-2006 at 20:57. |
|
|
|
|
|
|
#12 | |
|
Join Date: Apr 2002
Posts: 613
|
Quote:
While we are at it, how can anyone claim the superiority of Xenos, if no benchmarking metrics, or even game to game comparisons can accurately be made in the console sector? Yes, I realize that's what's being discussed here, but the end result from what I see is similar performance, with each part having different strength attributes for different circumstances. Xenos is hardly a huge win because of unified shaders over a discrete part like Nvidia's 7900 series, and we all know that the 7900 series is meeting excellent die size and power issue requirements for the console space.
__________________
SN Systems (Middleware): "Since these different parts (SPEs) can all access their own memory at full speed simultaneously, it should give the PS3 a significant performance advantage." Edge Magazine August 2005 issue |
|
|
|
|
|
|
#13 | |
|
Member
Join Date: Aug 2005
Posts: 214
|
Quote:
Nobody is claiming one is better than the other, it's a matter of which is more flexable. |
|
|
|
|
|
|
#14 | |
|
Friends call me xbd
Join Date: Feb 2005
Posts: 6,309
|
Quote:
And I mean if there are NDA issues that prevent you from answering I understand, but I just have to ask: what's your own estimate on the transistor budget allocated on Xenos to control/management logic? SGX and R600 and the rest of it aside, I imagine you must have a sense of what these transistor allocations are within Xenos.
__________________
Somebody set up us the bomb. |
|
|
|
|
|
|
#15 | |
|
Join Date: Apr 2002
Posts: 613
|
Quote:
__________________
SN Systems (Middleware): "Since these different parts (SPEs) can all access their own memory at full speed simultaneously, it should give the PS3 a significant performance advantage." Edge Magazine August 2005 issue |
|
|
|
|
|
|
#16 | ||
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,989
|
Quote:
Quote:
However, the more I look at it the more I believe the notion of actual unification is secondardy in terms of costs - it pretty much does the same things as a traditional architecture and it follows the same path, except that the the shader elements move to the same hardware element but then diverge again when they pop out the back. Whats more important with an architecture like Xenos is actually the command control - i.e. batch handling/juggling/sizes. Xenos, here bears many similarities with R520/580's architecture. Its impossible to tell if there is that much difference between a unified shader architecture forbeing unified, or just having that level of batch handling capabilities. One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not. |
||
|
|
|
|
|
#17 | |
|
Friends call me xbd
Join Date: Feb 2005
Posts: 6,309
|
Quote:
Thanks for the answer though, and that last comment of yours is very interesting. Raises a number of questions itself, and clearly DX10 or no, implies there must be a faction within ATI that feels the future is now for unified.
__________________
Somebody set up us the bomb. Last edited by Carl B; 30-Apr-2006 at 22:29. |
|
|
|
|
|
|
#18 | |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,989
|
Quote:
|
|
|
|
|
|
|
#19 | |
|
Friends call me xbd
Join Date: Feb 2005
Posts: 6,309
|
Quote:
__________________
Somebody set up us the bomb. Last edited by Carl B; 30-Apr-2006 at 22:58. |
|
|
|
|
|
|
#20 | ||
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,989
|
Quote:
Quote:
|
||
|
|
|
|
|
#21 | |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
|
#22 | |
|
Senior Member
Join Date: Jun 2005
Location: Bridgewater, NJ
Posts: 3,313
|
Quote:
|
|
|
|
|
|
|
#23 | |||
|
Senior Member
Join Date: Mar 2002
Posts: 3,786
|
Quote:
Quote:
Quote:
To me, the middle ground that R580 took doesn't seem to make a lot of sense. If your batches can be so small, and your shader units can switch instructions so quickly, then you've already covered the most difficult part of going unified. Either stick with the efficient large batch size approach from before, or take advantage of the work done with Xenos. Right now, it seems like R580 performs at best equal to that of a theoretical similarly sized Xenos based design. I know there's a lot of factors to take into account, but that's my guess. Maybe they were just being cautious because Xenos may have some unkown performance quirks or bugs that would hurt them in the open PC market. |
|||
|
|
|
|
|
#24 | ||
|
Senior Member
Join Date: Mar 2002
Posts: 3,786
|
Quote:
The reason it's "only now coming into existence" is that when this chip comes out, nothing else will be unified, so all of its advantages will be ignored. Then all the engineering effort to do this will be wasted. It's not like feature transitions in the past, where you had nearly immediate benefits. R3xx sold well because it blew away previous gens in DX8 performance. Moreover, DX9 features can be implemented in a way that DX8 fallbacks are easy. Unified shaders gives you fast vertex texturing and enormous vertex shading capability, but if much of your game's target market doesn't have a US, they need a radically different fallback. Quote:
Anyway, the point Dave is trying to make is that both ATI and PVR have made non-unified designs before. If they're both choosing to move in this direction, then obviously they feel it will save cost and/or improve performance. |
||
|
|
|
|
|
#25 | ||
|
Senior Member
Join Date: Mar 2002
Posts: 3,786
|
Quote:
We know that for polygons with complex vertex shaders, having more vertex shading units help linearly. Look at tests from 3DMark between different chips. Look also at how resolution makes little difference in these tests, so reduced pixel shading resources are not an issue. Unless ATI screwed up in Xenos, it can perform a 48-cycle vertex shader at 500Mverts per second. RSX will do 92M. We know that shader pipes in GPUs have achieved near 100% of their texel rate for 5+ years now. Again, unless ATI screwed up, Xenos will do the same while vertex texturing. So a vertex shader with 16 texture accesses will run at 500Mverts per second. Empirically, G7x has taken up to 200 cycles per VTF (though supposedly 20 cycles is the theoretical performance). A vertex shader with only one texture access could perform as poorly as 22M per second. This covers the main advantages of unified shading. Then other differences between the chips include bandwidth, which I'm not going to rehash. There's also dynamic branching. Given Dave's hints that R5xx's scheduler and DB system came from Xenos, we can expect similar performance here. We've seen factors of 2x-10x over large batch based GPUs. On the other hand, RSX's big advantage is texturing. Any texture loaded shader should run 65% faster if bandwidth isn't an issue. This often used to be the case several years ago, but not so much now due to math being important. It may resurge if spherical harmonic lighting (good stuff!) takes off. So even though we don't have any measurements or benchmarks, we can make reasonable assumptions about certain aspects of rendering. Quote:
|
||
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|