How does one 7900 vertex shader compare to a Xenos alu in terms of sops?

Dave Baumann said:
Of interest, PowerVR's site indicates that the lowest performance version of SGX can fit into a 90nm die size of less than 2x2mm!
Actually, it says less than 2 mm². ;)
 
Dave Baumann said:
One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not.

Since the 5xx part numbers have long since been jumbled in my head, are you saying that some in the ATI ranks feel that their line of PC GPUs should have moved to a unified architecture sooner than they are now? (i.e. the x1800 should have been a unified design?)
 
Dave Baumann said:
Except that doesn't really mesh neatly with the notion of implementing them in very die size/power/performance critical implementations such as handheld devices, yet PowerVR SGX is unified and all indications are that ATI is taking a Xenos like architecture to handhelds this or next year as well. The strongest proponent of this line of argumentation is the company that doesn't yet have a unified design...
I never said I think it's more expensive, I just said I've heard it. ;)

Dave Baumann said:
Whats more important with an architecture like Xenos is actually the command control - i.e. batch handling/juggling/sizes. Xenos, here bears many similarities with R520/580's architecture. Its impossible to tell if there is that much difference between a unified shader architecture forbeing unified, or just having that level of batch handling capabilities.
I'm sure you already know I totally agree with that. Handling the small batches is the biggest die size eater. That's why comparisons between Xenos and RSX to judge the cost of US are rather silly, because there are many as yet unseen rendering scenarios that Xenos will totally rule at, and it won't be specifically the US nature. For many traditional techniques it may not have the "performance density" of RSX, but that's just part of the story.

One thing that I do know is that there are deep divisions in ATI as to whether the R520 architecture should have gone unified or not.
Very interesting.

To me, the middle ground that R580 took doesn't seem to make a lot of sense. If your batches can be so small, and your shader units can switch instructions so quickly, then you've already covered the most difficult part of going unified. Either stick with the efficient large batch size approach from before, or take advantage of the work done with Xenos. Right now, it seems like R580 performs at best equal to that of a theoretical similarly sized Xenos based design. I know there's a lot of factors to take into account, but that's my guess. Maybe they were just being cautious because Xenos may have some unkown performance quirks or bugs that would hurt them in the open PC market.
 
Edge said:
Smarter in terms of certain workloads, but not smarter interms of die size, and thus cost. Unified shaders are only now coming into existence because the technology exists to create those huge chips.
As Dave just showed you, it's not the huge chips that are enabling it. When ATI brings this to the PC, you'll see a top to bottom range of cards too.

The reason it's "only now coming into existence" is that when this chip comes out, nothing else will be unified, so all of its advantages will be ignored. Then all the engineering effort to do this will be wasted.

It's not like feature transitions in the past, where you had nearly immediate benefits. R3xx sold well because it blew away previous gens in DX8 performance. Moreover, DX9 features can be implemented in a way that DX8 fallbacks are easy. Unified shaders gives you fast vertex texturing and enormous vertex shading capability, but if much of your game's target market doesn't have a US, they need a radically different fallback.

Edge said:
What is the die size for the performance of those parts? You can hardly claims the benefits of something that does not exist yet, and cannot be used for comparison purposes. Saying they are going to use unified parts for that sector is not good enough, if performance is lacking due to a lower number of execution units, and corresponding data lines and associated registers.
One of the big things about these markets is that resolution is very low. 1600x1200 has 25 times the pixels of 320x240. But it's not as easy to get away with 25 times fewer vertices. So your vertex load goes up (relatively speaking) and US makes sense.

Anyway, the point Dave is trying to make is that both ATI and PVR have made non-unified designs before. If they're both choosing to move in this direction, then obviously they feel it will save cost and/or improve performance.
 
While we are at it, how can anyone claim the superiority of Xenos, if no benchmarking metrics, or even game to game comparisons can accurately be made in the console sector? Yes, I realize that's what's being discussed here, but the end result from what I see is similar performance, with each part having different strength attributes for different circumstances.
Nobody has made that claim. But the claims we have made are well backed up by evidence.

We know that for polygons with complex vertex shaders, having more vertex shading units help linearly. Look at tests from 3DMark between different chips. Look also at how resolution makes little difference in these tests, so reduced pixel shading resources are not an issue. Unless ATI screwed up in Xenos, it can perform a 48-cycle vertex shader at 500Mverts per second. RSX will do 92M.

We know that shader pipes in GPUs have achieved near 100% of their texel rate for 5+ years now. Again, unless ATI screwed up, Xenos will do the same while vertex texturing. So a vertex shader with 16 texture accesses will run at 500Mverts per second. Empirically, G7x has taken up to 200 cycles per VTF (though supposedly 20 cycles is the theoretical performance). A vertex shader with only one texture access could perform as poorly as 22M per second.

This covers the main advantages of unified shading. Then other differences between the chips include bandwidth, which I'm not going to rehash. There's also dynamic branching. Given Dave's hints that R5xx's scheduler and DB system came from Xenos, we can expect similar performance here. We've seen factors of 2x-10x over large batch based GPUs. On the other hand, RSX's big advantage is texturing. Any texture loaded shader should run 65% faster if bandwidth isn't an issue. This often used to be the case several years ago, but not so much now due to math being important. It may resurge if spherical harmonic lighting (good stuff!) takes off.

So even though we don't have any measurements or benchmarks, we can make reasonable assumptions about certain aspects of rendering.

Xenos is hardly a huge win because of unified shaders over a discrete part like Nvidia's 7900 series, and we all know that the 7900 series is meeting excellent die size and power issue requirements for the console space.
As stated already, we don't know that lack of unified shaders is what makes the G71/RSX so small. Note the above paragraph about performance differences unrelated to US. If those weren't there, would G7x still be smaller? Who knows.
 
Mintmaster said:
It may resurge if spherical harmonic lighting (good stuff!) takes off.
Both machines will be capable of this, right? The RSX would just have an advantage at doing it.
 
Oh yeah, for sure. Both are capable of using it.

I think the most promising use is for secondary illumination only (e.g. indirect reflections, subsurface scattering, etc), with primary illumination coming from your ordinary shadow map + per pixel dot product lighting and variants. In this case, the math will still be there, so I don't think texturing will be an issue.

SH PRT in its original incarnation basically just involves texture fetches and MADDs (or DP4's) in a 1:1 ratio.
 
Back
Top