How can we compare the Xenos to other unified shader PC GPUs?

Mobius1aic

Quo vadis?
Veteran
Just a thought, especially since it seems it wasn't designed with what I call the "R600 philosophy": use of an insane amount of unified shaders.

Now surely, as used purely as a DX9 GPU it's on par if not more powerful than GeForce 8600s and Radeon 2600s?

Just wondering, I do find the idea of putting Crysis on the 360 intriguing, and I think it's more than capable of doing it except for the memory issue, which I had predicted way in the beginning of the "CRYSIS COULD BE PUT ON TEH 360!" wars all over the internet.
 
Just a thought, especially since it seems it wasn't designed with what I call the "R600 philosophy": use of an insane amount of unified shaders.

Now surely, as used purely as a DX9 GPU it's on par if not more powerful than GeForce 8600s and Radeon 2600s?

Just wondering, I do find the idea of putting Crysis on the 360 intriguing, and I think it's more than capable of doing it except for the memory issue, which I had predicted way in the beginning of the "CRYSIS COULD BE PUT ON TEH 360!" wars all over the internet.

We know that Nvidia's shader processors aren't directly comparable to ATI's because the 2900's '320' shaders, is still slower then the 8800's '128' shaders even after you convert the 8800's because they are clocked higher. The 8600gts has 32 shaders which are clocked at 1.45ghz, where as the Xenos has 48 shaders clocked at 500mhz.

They should be pretty close in fillrate and texture filtering though.

Xenos fillrate.

500 x 8 = 4.0 gigapixels / seconds

Texel fillrate

500 x 16 = 8.0 gigatexels / seconds

8600gts fillrate.

675 x 8 = 5.4 gigapixels / seconds

Texel fillrate

675 x 16 = 10.8 gigatexels / seconds

It seems the 8600gts pretty much has the Xenos licked.

PS: Please someone correct me if i got any of these numbers wrong.
 
Last edited by a moderator:
We know that Nvidia's shader processors aren't directly comparable to ATI's because the 2900's '320' shaders, is still slower then the 8800's '128' shaders even after you convert the 8800's because they are clocked higher. The 8600gts has 32 shaders which are clocked at 1.45ghz, where as the Xenos has 48 shaders clocked at 500mhz.

They should be pretty close in fillrate and texture filtering though.

Xenos fillrate.

500 x 8 = 4.0 gigapixels / seconds

Texel fillrate

500 x 16 = 8.0 gigatexels / seconds

8600gts fillrate.

675 x 8 = 5.4 gigapixels / seconds

Texel fillrate

675 x 16 = 10.8 gigatexels / seconds

It seems the 8600gts pretty much has the Xenos licked.

PS: Please someone correct me if i got any of these numbers wrong.

Completely apples and oranges. 8600GT wouldn't have Xenos licked. IIRC 8600GT is only a bit faster than a 7600GT, and Xenos should be a good deal faster than a 7600GT.
Really man, just look at the games. It's not that hard to see that 360 and PS3 are similar. And we know RSX=7900GTX basically. Which is about 2X a 7600GT.

8600GT's shaders only process one component at a time. They're scaler. 360's ALU's process five at a time. So in apples to apples terms you might say 360 has 48X5=240 shaders. Even that wont tell you a whole lot. But I'm confident Xenos has significantly more raw shading power than 8600GT.

You really cant compare easily with the G80 family though. Xenos is a lot more like X1900/7800GTX class.

Also, benchmarks were done and linked recently, and you find Xenos again scoring almost exactly in the area of a 7900GTX.
 
Completely apples and oranges. 8600GT wouldn't have Xenos licked. IIRC 8600GT is only a bit faster than a 7600GT, and Xenos should be a good deal faster than a 7600GT.
Really man, just look at the games. It's not that hard to see that 360 and PS3 are similar. And we know RSX=7900GTX basically. Which is about 2X a 7600GT.

8600GT's shaders only process one component at a time. They're scaler. 360's ALU's process five at a time. So in apples to apples terms you might say 360 has 48X5=240 shaders. Even that wont tell you a whole lot. But I'm confident Xenos has significantly more raw shading power than 8600GT.

You really cant compare easily with the G80 family though. Xenos is a lot more like X1900/7800GTX class.

Also, benchmarks were done and linked recently, and you find Xenos again scoring almost exactly in the area of a 7900GTX.

Something I missed thanks for that, I also had no clue on the 8600GTS performance numbers. If the 8600GTS's is only scalar then how come the G80 is absolutely slaughtering the 2900?, or does ATI count each component it can process as a shader?
 
Before b3d forum crash i see a topic here with a japanese blog link (please if annyone have this send to me) talk about Xenos have performance very similar to Geforce 7800GTX (they compare geforce with 4 shader alu with Xenos).

(i se here in b3d a old topic of guy/engineer ATI compare R-500/C1/Xenos with Radeon 1800XTX/R-520 with overall same performance at 720P resolutions)
 
Alright, cool. This was the kind of info I was expecting. The Xenos has shown to be quite the performer as I thought it was, but I wanted to know where it lied in the evolution of unified shader GPUs, because the way it had been detailed out so far, it makes you want to look at like an Nvidia 8 series, with lower amounts of more powerful shaders instead of an ATi 2xxx series, which is a massive load of less powerful shaders units instead.

Any reason as to why ATi didn't use it for a PC graphics board? Frankly it would have given ATi a nice product to offer that Nvidia didn't have until the 8 series. Even if Xenos really isn't fully DX10 capable, you still can't mess with the pure efficiency of a unified shader GPU, even with DX9.
 
Before b3d forum crash i see a topic here with a japanese blog link (please if annyone have this send to me) talk about Xenos have performance very similar to Geforce 7800GTX (they compare geforce with 4 shader alu with Xenos).

(i se here in b3d a old topic of guy/engineer ATI compare R-500/C1/Xenos with Radeon 1800XTX/R-520 with overall same performance at 720P resolutions)

I don't know about comparison to 7800, but IIRC some ATI reps mentioned it being around X1800-X1900 in terms of performance, which kinda matches the 7800 comparison too.
 
Just wondering, I do find the idea of putting Crysis on the 360 intriguing, and I think it's more than capable of doing it except for the memory issue

Any game can run on any system. The only question is how much you have to change it. Since Crysis can bring a GTX to its knees on high settings you certainly can't expect anything like that on the 360. Medium settings would probably be doable though with maybe a few enhancements.

Alright, cool. This was the kind of info I was expecting. The Xenos has shown to be quite the performer as I thought it was, but I wanted to know where it lied in the evolution of unified shader GPUs, because the way it had been detailed out so far, it makes you want to look at like an Nvidia 8 series, with lower amounts of more powerful shaders instead of an ATi 2xxx series, which is a massive load of less powerful shaders units instead.

Xenos is the first generation unified GPU from ATI as opposed to R600 being the second generation. So R600 is an evolved version of Xenos. G80 is a completely different family so doesn't really relate.

G80's and R600's shader units are actually pretty similar. The difference is as I understand it that G80's are a little more flexible in how they are utilised due to being fully scalar and they also run at a higher clock speed. Xenos on the other hand has much bigger shader units which lack that flexibility and thus will suffer lower utilisation. They obviously run at a lower clock speed to either G80 or R600. Raw power wise on paper it stands as something like this:

Xenos: 216 GFLOPS
G80: 345.6 GFLOPS
R600: 473.6 GFLOPS

And in terms of theoretical utilisation of that its something like this:

G80>R600>>Xenos

Any reason as to why ATi didn't use it for a PC graphics board? Frankly it would have given ATi a nice product to offer that Nvidia didn't have until the 8 series. Even if Xenos really isn't fully DX10 capable, you still can't mess with the pure efficiency of a unified shader GPU, even with DX9.

eDRAM doesn't suit the PC model and ATI already had the X1900 series on the horizon which has Xenos beat already in most areas.
 
http://blogs.msdn.com/shawnhar/archive/2006/12/11/sixty-fractals-per-second.aspx

http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_shader_pe.html
http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_madd_perf.html
http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate__1.html
http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate_.html

Xenos's ALUs are a small evolution from the baseline set by R300, towards R600. R300's MAD+ADD was chopped down effectively for Xenos, into MAD+SF (or MAD + scalar ADD, if I remember right).

Simplistically, R300 and Xenos can issue two independent instructions each clock cycle, MAD + SF. In R300 this is vec3 MAD + SF. In Xenos it's vec4 MAD + SF. R300 can do vec4 MAD, with the SF joining in. The rationalisation for Xenos's design is that it's got to do both vertex and pixel shading, and vertex shading more commonly needs to operate on vec4 data (x,y,z,w) whereas in pixel shading vec3 (red, green, blue) is often all that's needed (hence the bias of R300's pixel shaders).

R300 uses the ADD ALU as a pre-processor for MAD instructions (mostly for Directx 8 "fixed functions", like scaling by 2x). At best you can get 3 instructions out of R300 (which is the same all the way up to R580), MAD for RGB, SF (e.g. reciprocal) and ADD/DX8-FF. The latter must always deliver its result to the MAD+SF ALU, though, it cannot write to a register (took me ages to realise this restriction :cry: ). As far as I can tell Xenos integrates the DX8-FFs and there's no "auxilliary ALU" like R300's ADD on the side.

R600 is vec4 MAD+SF but the twist is that it's 5 entirely independent instructions. On a good day it is 2x faster than R300 per clock, per ALU, but it prolly averages 30-50% faster.

Jawed
 
@__@ Numbers.......

LOL Thanks though guys. I'm not to good with exacting issues with GPU design and programming, but I can understand it pretty well on a minimal basis. Now if only we had a real idea of what the Wii's GPU really is........
 
Xenos is the first generation unified GPU from ATI as opposed to R600 being the second generation. So R600 is an evolved version of Xenos.
No, R600 is truly second generation. The organisation of the register file (all scalar) and the 5-way instruction issue are quite a departure.

G80's and R600's shader units are actually pretty similar. The difference is as I understand it that G80's are a little more flexible in how they are utilised due to being fully scalar and they also run at a higher clock speed.
No, G80's ALU pipeline is superscalar, it issues a scalar MAD + scalar SF per pixel* per clock (later G8x, G9x variants can do scalar MAD + scalar MUL, apparently). R600 issues 4x scalar MAD + scalar SF per pixel per clock.

Actually, G80 issues SFs at 1/4 or 1/8 rate (but this isn't a problem because they're rarely needed more frequently).

* actually for pixels it's pairs of pixels for instruction issue, but for vertices it's singly so - a peculiarity of batching in G80

Jawed
 
Last edited by a moderator:
http://blogs.msdn.com/shawnhar/archive/2006/12/11/sixty-fractals-per-second.aspx

http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_shader_pe.html
http://texhnologix.blogzine.jp/texhnologix/2006/12/xenos_madd_perf.html
http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate__1.html
http://texhnologix.blogzine.jp/texhnologix/2007/06/xenos_fillrate_.html

Xenos's ALUs are a small evolution from the baseline set by R300, towards R600. R300's MAD+ADD was chopped down effectively for Xenos, into MAD+SF (or MAD + scalar ADD, if I remember right).

Simplistically, R300 and Xenos can issue two independent instructions each clock cycle, MAD + SF. In R300 this is vec3 MAD + SF. In Xenos it's vec4 MAD + SF. R300 can do vec4 MAD, with the SF joining in. The rationalisation for Xenos's design is that it's got to do both vertex and pixel shading, and vertex shading more commonly needs to operate on vec4 data (x,y,z,w) whereas in pixel shading vec3 (red, green, blue) is often all that's needed (hence the bias of R300's pixel shaders).

R300 uses the ADD ALU as a pre-processor for MAD instructions (mostly for Directx 8 "fixed functions", like scaling by 2x). At best you can get 3 instructions out of R300 (which is the same all the way up to R580), MAD for RGB, SF (e.g. reciprocal) and ADD/DX8-FF. The latter must always deliver its result to the MAD+SF ALU, though, it cannot write to a register (took me ages to realise this restriction :cry: ). As far as I can tell Xenos integrates the DX8-FFs and there's no "auxilliary ALU" like R300's ADD on the side.

R600 is vec4 MAD+SF but the twist is that it's 5 entirely independent instructions. On a good day it is 2x faster than R300 per clock, per ALU, but it prolly averages 30-50% faster.

Jawed

Thanx a lot for links and information but overall how much in % ( generaly NUMA shaders Alus gpu betwen 53% in G70 to 60% of max flops cicle) processing sustained of maximum theorical Xenos can reach (480 flops per cicle or peak 240 GFlops pixel shaders+vertex shaders at same time)?

(i have heard something like 75% of max Gflops sustained and nothing 90% expected eficience )
 
Last edited by a moderator:
Thanx a lot for links and information but overall how much in % ( generaly NUMA shaders Alus gpu betwen 53% in G70 to 60% of max flops cicle) processing sustained of maximum theorical Xenos can reach (480 flops per cicle or peak 240 GFlops pixel shaders+vertex shaders at same time)?

(i have heard something like 75% of max Gflops sustained and nothing 90% expected eficience )
Whenever you issue an ADD instruction or a MUL instruction, you've "wasted" 50% of the available FLOPs in the MAD ALU! Not every instruction is a MAD.

It's better to think in terms of ALU component utilisation. e.g. a vec2 instruction for one clock cycle on Xenos leaves 2 scalar MAD units and the SF idle.

Both G80 and R600 tackle the utilisation problem head-on. G80 goes further in two ways:
  1. all vector instructions are broken down to issue only the portions of the vector that are being used - there is no vec2 used + vec2 unused problem - this is sequential component issue
  2. it uses the ALU pipeline to do some texture-related calculations, which increases the utilisation of the SF unit (which otherwise could be idle)
Jawed
 
Something I missed thanks for that, I also had no clue on the 8600GTS performance numbers. If the 8600GTS's is only scalar then how come the G80 is absolutely slaughtering the 2900?, or does ATI count each component it can process as a shader?
For R600, 320 shaders means 320 scalar processors, but for Xenos, 48 shaders mean 48 vec4 plus 48 scalar.

Also your math is a bit off because 128 shaders in G80 is not compensated by it's higher clock. R600 can and does beat G80 in many math limited tests. However, G80's scalar processors are more flexible than R600's for scalar operations, and there's a LOT more to performance than just shader math ability.
 
My memory is hazy but I could've sworn it was deducted (maybe confirmed????) that it was on par with the Radeon X1800 from one of the threads here like over a year ago...
 
My memory is hazy but I could've sworn it was deducted (maybe confirmed????) that it was on par with the Radeon X1800 from one of the threads here like over a year ago...

I think that was about some ATI rep saying it's around that level of performance
 
Completely apples and oranges. 8600GT wouldn't have Xenos licked. IIRC 8600GT is only a bit faster than a 7600GT, and Xenos should be a good deal faster than a 7600GT.
Really man, just look at the games. It's not that hard to see that 360 and PS3 are similar. And we know RSX=7900GTX basically. Which is about 2X a 7600GT.

8600GT's shaders only process one component at a time. They're scaler. 360's ALU's process five at a time. So in apples to apples terms you might say 360 has 48X5=240 shaders. Even that wont tell you a whole lot. But I'm confident Xenos has significantly more raw shading power than 8600GT.

You really cant compare easily with the G80 family though. Xenos is a lot more like X1900/7800GTX class.

Also, benchmarks were done and linked recently, and you find Xenos again scoring almost exactly in the area of a 7900GTX.

8600 GTS nears the X1950 pro though, with one card beating the other depending on the game (the X1950 pro being overall faster). but I agree shaders are a weak point, 32 scalars units is a low number really.
 
My memory is hazy but I could've sworn it was deducted (maybe confirmed????) that it was on par with the Radeon X1800 from one of the threads here like over a year ago...

Specifically he said the X1800 would be faster at high resolutions and Xenos would have an edge at lower resolutions due to higher shader power and frame buffer bandwidth.

It has also been stated elsewhere by ATI that Xenos is theoratically weaker than the X1900 but should give a similar end user experience - presumably accounting for its closed box nature allowing it to go better utilised.
 
Back
Top