Is the Xenos a shader monster yes or no?

Status
Not open for further replies.
Edge said:
Please provide evidence for that. I take all these efficiency claims with a grain of salt.

This is one of the problems we have with comparing the X360 : we have benchmarks and stuff for CELL and can estimate for RSX (if we take it as a G71 with half mem bus), but we're left with speculation from "experts" on matters concerning Xenon/Xenos, thereby allowing them to go to the limits of their imaginations :rolleyes:

How much of an advantage is (supposedly) efficiency over slightly higher overall power if we consider a closed box? I personally think that the latter has the upper hand .
 
I see your using the free FP normalise in the G7x calculations. I decided not to use it as its not really comparable to the rest of the ops being more specialised. In the same way I havn't included the free branching operations for the ATI hardware.

I clearly said all ratings were highly theoretical.
I think it's unfair of you to compare so called realtime G7x ratings with theoretical R5x0 ratings (note you're X1900 ratings were higher than possible) just says something about your preference I think.
No one can act like the G7x don't have more RAW power than the R5x0 series because they certainly do per clock the G7x just has more RAW power.
Just like the P4 has more RAW power than the Amd 64 it's just that Intel engineers like nVidia engineers can't manage to get it out.
Meaning both Amd and Ati are more efficiënt with their hardware.

How much of an advantage is (supposedly) efficiency over slightly higher overall power if we consider a closed box? I personally think that the latter has the upper hand .

If we think of the RSX as being a G71 @550MHz then the power difference between RSX and Xenos is anything but slight.
Xenos:
216GFLOPS (taking pjbliverpools word)
8GTexel/s and 4GPixel/s fillrate
48 Billion Shader ops

RSX as G71 @550MHz:
400,4GFLOPS
13,2GTexel/s and 8,8GPixel/s fillrate
74,8 Billion Shader ops

Judging by both theoretical ratings the RSX isn't anything but slightly better but rather a landslide.
 
Guilty Bystander,
how do you get 728 flops/cycle for RSX/G70/G71?
Ignoring nrmh and modifiers, it's more like 24 * 16 + 8 * 9 = 456. Or did you add FP16 filtering and blending?
 
Guilty Bystander said:
I clearly said all ratings were highly theoretical.
I think it's unfair of you to compare so called realtime G7x ratings with theoretical R5x0 ratings (note you're X1900 ratings were higher than possible) just says something about your preference I think.
No one can act like the G7x don't have more RAW power than the R5x0 series because they certainly do per clock the G7x just has more RAW power.

I wasn't comparing realtime figures for G7x vs Theoretical for R5xx, both were theoretical peaks, I just wasn't including the normalise like you have because its not a programmable component op. Also, my X1900 figures were dead on and can also be found in Dave's R580 article.

Anyway, how do you define raw power? Because the X1900XTX certainly has higher peak pixel shader FLOPS and its dead even in fill rate, vertex shader FLOPS and im assuming setup rate (all theoretical peaks). The only area G7x really comes out in front is texturing but thats actually a tradeoff with pixel shading anyway.
 
Guilty Bystander said:
If we think of the RSX as being a G71 @550MHz then the power difference between RSX and Xenos is anything but slight.
Xenos:
216GFLOPS (taking pjbliverpools word)
8GTexel/s and 4GPixel/s fillrate
48 Billion Shader ops

RSX as G71 @550MHz:
400,4GFLOPS
13,2GTexel/s and 8,8GPixel/s fillrate
74,8 Billion Shader ops

Judging by both theoretical ratings the RSX isn't anything but slightly better but rather a landslide.
/shakes head in disbelief.
 
Guilty Bystander,
how do you get 728 flops/cycle for RSX/G70/G71?
Ignoring nrmh and modifiers, it's more like 24 * 16 + 8 * 9 = 456. Or did you add FP16 filtering and blending?

Actually those are from nVidia themselfs.
24 Pixel Shaders x 27 Shader ops and 8 Vertex Shaders x 10 Shader ops = 728 Flops per clock.
The X1900 does:
48 Pixel Shaders x 12 Shaders ops and 8 Vertex Shaders x 10 Shader ops = 656 Flops per clock.

And by the way if you guys are taking Ati their words as the truth then I think it's bad not to do the same with nVidia.
 
Guilty Bystander said:
Why aren't developers pushing the insane amount of extra shader power the Xenos has compared to other GPU's like the R520/580 and G70/71?
The Xenos can do execute 4096 differents as opposed to the 1024 of the R520/580 and G70/71 and do 500k shader instructions as opposed to 131k of the R520/580 and G70/71.

Is the Xenos or the Xbox 360 as a whole just too difficult or what else?
Also why does Halo 2 stress the Xbox 360 hardware more and make it run hotter?

As a final question.
Why the hell are almost all Xbox 360 games only using bilinear filtering when the Gamecube and the Dreamcast were doing trilinear filtering shouldn't anisotropic filtering be done no problem?

Umm, are you sure about those numbers? I don't doubt xenos is a shading monster (it better be, cause it certainly isn't a fillrate monster), but I doubt it destroys those pc chips. Especially not R520/R580, which has a much larger die used for shader power, even if it lacks unified shaders.

I'd imagine Halo 2 uses more cores of the cpu, while most games are probably only using one.
Are you sure about trilinear? I know test drive le mans claimed anisotropic on the dreamcast, and gamecube used triple buffering which many people confuse with trilinear, but I doubt every game used trilinear.

I believe has better TMU's in some sense (both have 16)

Thought Xenos had 8... or is it 8 pixel pipes and 16 tmus?

!eVo!-X Ant UK said:
There's texture's in the fisrt halo game that match those, namely the tree bark

There's a few wall textures in ut2003 that I believe use displacement mapping and about match that.
 
Guilty Bystander said:
Actually those are from nVidia themselfs.
24 Pixel Shaders x 27 Shader ops and 8 Vertex Shaders x 10 Shader ops = 728 Flops per clock.
The X1900 does:
48 Pixel Shaders x 12 Shaders ops and 8 Vertex Shaders x 10 Shader ops = 656 Flops per clock.

And by the way if you guys are taking Ati their words as the truth then I think it's bad not to do the same with nVidia.
You are calculating Shader ops, how did you translate it into flops/cycle?
 
Fox5 said:
(it better be, cause it certainly isn't a fillrate monster)

Hit the search button, ERP and Mintmaster have already expounded on this point in the last 2 weeks. If you mean pixel fillrate Xenos has "only" 4 Gigapixel/s fillrate. But this is a "full" realworld fillrate. Most GPUs don't have enough memory bandwidth to meet their fillrate needs; further AA cuts fillrate even more whereas Xenos is designed to take no fillrate penalty for MSAA up to 4x (16 Gigasamples/s). Just comparing numbers is deceptive because the architectures have different bottlenecks.

R520/R580 also have 16 TMUs, so clock-for-clock Xenos has a similar texel fillrate, although G71 is higher peak due to 24 TMUs. But hit the search button because it is more complex than that because of how Xenos and G71 texture (different dependancies) that impact the shader pipeline differently.

Thought Xenos had 8... or is it 8 pixel pipes and 16 tmus?

8 ROPs.
16 TMUs.
 
Umm, are you sure about those numbers? I don't doubt xenos is a shading monster (it better be, cause it certainly isn't a fillrate monster), but I doubt it destroys those pc chips. Especially not R520/R580, which has a much larger die used for shader power, even if it lacks unified shaders.

I'm sure they came from nVidia and I take their word on it.
I aswell think Xenos doesn't stand a chance against current top PC GPU's simply because I don't believe a chip with 160 million transistors less can stand up toe to toe with the X1900XT/XTX.
There must be something cut down somewhere in order to get those 160 million transistors less.

I'd imagine Halo 2 uses more cores of the cpu, while most games are probably only using one.
Are you sure about trilinear? I know test drive le mans claimed anisotropic on the dreamcast, and gamecube used triple buffering which many people confuse with trilinear, but I doubt every game used trilinear.

Not sure if every Dreamcast game used Trilinear filtering but quite a lot did aswell as some did Anisotropic filtering.
With the Gamecube though every game uses Trilinear filtering as Trilinear filtering is free for the Flipper due to for it's time very advanced T&L engine.

Thought Xenos had 8... or is it 8 pixel pipes and 16 tmus

It has 8 Rops, 16TMU's and no absolutely no Pixel Pipelines.
Everything in the Xenos is completely de-coupled.

The Rops lie in the eDRAM by the way.
 
Hit the search button, ERP and Mintmaster have already expounded on this point in the last 2 weeks. If you mean pixel fillrate Xenos has "only" 4 Gigapixel/s fillrate. But this is a "full" realworld fillrate. Most GPUs don't have enough memory bandwidth to meet their fillrate needs; further AA cuts fillrate even more whereas Xenos is designed to take no fillrate penalty for MSAA up to 4x (16 Gigasamples/s). Just comparing numbers is deceptive because the architectures have different bottlenecks.

FSAA first and foremost need a lot of bandwidth which is why the Xenos has the eDRAM in order to let FSAA not eat away from the limited memory bandwidth (22,4GB/s) the Xbox 360 already has to share with the Xenos, Xenon and system resources but rather eat from that 256GB/s of the eDRAM.
The same goes for HDR rendering, Z-testing and Alpha blending by the way.
That's the reason why high-end PC GPU's always have a 256bit memory interface in order to meet with the demands of the bandwidth eaters like FSAA, HDR and all those other bandwidth hungry features.
As you problably understand from my words shared memory isn't my cup of tea especially when it's such a limited amount of bandwidth you have to work with.

This is something the PS3 doesn't have such a problem with as the Xbox 360 because with the Xbox 360 the Xenos functions as the memory controller whereas in the PS3 the Cell functions as the memory controller for the XDR and the RSX functions as a memory controller for both the XDR aswell as the GDDR3.
When utilised properly in the future the PS3 won't have such a bandwidth bottleneck as the Xbox 360 and this will give developers some room to make programming a little easier as they don't have to watch out for bandwidth limitations as much as they need to do with the Xbox 360.
This could all still turn into a problem if FSAA, HDR etc all need to be put into that bandwidth though.
Man I hope PS3 will get some eDRAM similar to the Xbox 360.:idea:

Time will tell I guess.
 
Guilty Bystander said:
There must be something cut down somewhere in order to get those 160 million transistors less.

AVIVO, DX7/DX8 fixed function logic and legacy support, etc. I also believe the X1000 series has a mini-ALU (foggy on this point).

That said, shader ALUs are not huge. The X1900 added 32 Shader ALUs at a cost of 63M transistors. Xenos has ~255M transistors for rendering, which is not far off the ~280M in G71 for example (which has Purevideo and legacy logic). Of course G71, Xenos, and X1000 are all fairly different architectures so transistors counts cannot be compared directly, and ATI and NV may count differently, but the point stands.

What you are asking is akin to trying to figure out why CELL has 50% more transistors but nearly 130% higher peak floating point performance. Different architectures, different design goals.

And by the way if you guys are taking Ati their words as the truth then I think it's bad not to do the same with nVidia.
...
I'm sure they came from nVidia and I take their word on it.

You have to know what they are talking about before you can start making conclusions. A number of people corrected you on the G71 float numbers.
 
Guilty Bystander said:
Actually those are from nVidia themselfs.
24 Pixel Shaders x 27 Shader ops and 8 Vertex Shaders x 10 Shader ops = 728 Flops per clock.
This is clearly wrong. It's certainly not 27 "shader ops" per pixel shader pipeline. They add FP16 filtering and blending to the equation, but these are not shader ops.

And by the way if you guys are taking Ati their words as the truth then I think it's bad not to do the same with nVidia.
Ignore both their marketing crap. You can get the real numbers if you're willing to dig a bit deeper.

Guilty Bystander said:
With the Gamecube though every game uses Trilinear filtering as Trilinear filtering is free for the Flipper due to for it's time very advanced T&L engine.
T&L has no relation whatsoever to trilinear filtering.
 
AVIVO, DX7/DX8 fixed function logic and legacy support, etc. I also believe the X1000 series has a mini-ALU (foggy on this point).

That said, shader ALUs are not huge. The X1900 added 32 Shader ALUs at a cost of 63M transistors. Xenos has ~255M transistors for rendering, which is not far off the ~280M in G71 for example (which has Purevideo and legacy logic). Of course G71, Xenos, and X1000 are all fairly different architectures so transistors counts cannot be compared directly, and ATI and NV may count differently, but the point stands.
\

Are you sure the Xenos doesn't have DX8 fixed functions to help make it emulate Xbox 1 games or is that entirely done through software?
I don't want to correct you as I you make some good points but wasn't the Xenos rated at 235 million transistors?
 
Guilty Bystander said:
Are you sure the Xenos doesn't have DX8 fixed functions to help make it emulate Xbox 1 games or is that entirely done through software?
I don't want to correct you as I you make some good points but wasn't the Xenos rated at 235 million transistors?

Parent die is 232M transistors, daughter die is 105M (or 115M, although I believe 105M was the final number provided), with 80-90M for eDRAM. Most of the backwards compatibility for the Xbox1 is software. MS did license some patents, but that was pretty late considering the time frame of Xenos first taping out in late 2004 with a final tape out in July 2005. Needless to say, if there is any hardware legacy for Xbox1 it is minor and Xenos does not have full fledges legacy support for DX7/8 fixed function features. And there is no reason why it would. Xenos is a clean slate design for the console space.
 
Guilty Bystander said:
Are you sure the Xenos doesn't have DX8 fixed functions to help make it emulate Xbox 1 games or is that entirely done through software?

DX8 fixed functions are emulated through shaders.
 
Status
Not open for further replies.
Back
Top