Please Clear This Up - What PC GPU Does the XBOX 360 & PS3 Use?

chachi said:
The OOE in the new IBM core (Xenon/PPE) is probably fairly similar to that in the Cube which is really rudimentary.
There's no OOOE in the new PPC cores(rudimentary or whatever). The idea is basically that multithreading the core should make up for a hardware scheduler.
If we just look at single threaded execution/performance, I'd actually call Xenon/PPE cores EIOE (extremely in order execution :p) courtesy of how IBM handled load/store/caching implementation.
 
chachi said:
The OOE in the new IBM core (Xenon/PPE) is probably fairly similar to that in the Cube which is really rudimentary. The Cube CPU had a really small pipeline though (if it's similar to the 750 which I think is the case) so it's not as big a deal.

What is on the general purpose IBM core though is much better than what is on the SPU which doesn't have any branch predication at all and relies on programmers to hint branches. If you're hand tuning everything then it isn't a problem but the point of stuff like this being implemented in the first place was to make code run fast regardless of compilers or talent and time spent optimizing things. You can be sure that for the next generation of Cell they'll have that stuff added back in (along with better DP FP, etc.) because it's good to have, they just didn't have the transistor budget to do it this time around.

Xenon has OOE? Anyhow, Gecko may not have had much in the way of it, but Gecko was like a 2 state pipeline and not a 10 stage pipeline, and if Xenon is really in-order, that could hurt performance a bit. (I remember hearing the possibility of 1/10th the performance of an equivalent in order cpu, however I'd expect 1/3 to 1/2 to be the typical case, but it has 3 cores so it could still be a net gain)

Anyone know what the NV2a pixel shader capabilities where? It should be fairly easy to make a rough comparison once we have that information.

RSX and Xenos shaders are so much more capable though, they should be more powerful than the numbers would indicate.
 
FSAA + HDR can come in many formats. So, yes, absolutely.

No FP16 + MSAA, but obviously FP16 is hardly the only choice for storing the framebuffer.
 
marco_simao said:
Does the RSX do FSAA + HDR ??

This should address the majority of questions you may have on the topic...

Article

But if you do have any additional questions, this thread here is probably the better place to discuss RSX specifically in terms of capabilities than this thread, which is more based on the heritage of the GPUs.
 
Last edited by a moderator:
From my shaky sources, the Xenos GPU in the 360 is akin to ATi's next gen chip, but better because of the EDRAM, and RSX in the PS3 is like a 7600GT but worse because it's just a PC GPU strapped in a console.

BUT.... the cpu's ballance them out because Cell is capable of anti-aliasing, and doing HDR for the RSX, and also it will never be full tapped just like the Emotion Engine, so it will just get better and better, until the graphics look about on par with the 360, because the cpu in the 360 has almost NO floating point operations compared to the Cell, plus it only has three cores, apparently it can run word processing faster though because it's good at integer ops, so If you want to get integer op games, then go for the 360.

Anyway, thems what I heard

And by the way, my kotaku... I mean sources says that this is all one hundered percent true.

But take it with a grain of salt.
 
But take it with a grain of salt.

You know what? Your post would be funny if there was not some truth to it :!: Ironically the reason these things persist is because they contain some truth and some opinions that border on facts... but it is the lies that bite you in the arse! But since Google is sure to pick up your rumors ;) ...

MBDF said:
From my shaky sources, the Xenos GPU in the 360 is akin to ATi's next gen chip, but better because of the EDRAM

The R600 will have a number of features (Geometry Shaders, interger and bitwise support, etc) that Xenos does not have, and will also have more raw performance (more shader ALUs, higher frequency, etc). Without knowing the memory configuration (GDDR4? frequency?) it is hard to say much on memory, but eDRAM does alleviate some significant memory bottlenecks and shifts a lot of the load back to the shader pipeline. But then again R600 will have more dedicated memory footprint and bandwidth for texturing limited scenarios. It will be interesting to see if R600 does two Z samples per clock, even with MSAA enabled.

and RSX in the PS3 is like a 7600GT but worse because it's just a PC GPU strapped in a console.

More like a 7900GTX @ 550MHz but with only 8 ROPs (doesn't have the bandwidth to sustain 16 ROPs worth of fillrate anyhow) and some adjustments to the caches and whatnot. The 7600GT is NOT a good comparison because RSX has more shader units, more texture units, etc...
 
I know this is old but, since we are almost a year further now, can you guys judge now which GPU is better, the 360's Xenon or the X1900XTX 512?

Thanks
 
That question cannot be answered without a definition of which area or other means you want to be measured. Better at what, exactly? As Farid mentioned elsewhere, in a fantastic post on the subject, 'better' (or 'more powerful') are meaningless terms that have no place in intelligent discussion, unless you go to the effort of deciding what exactly you're measuring. There's lots of ways one of those GPUs is better than the other, and vice versa. So to just outright prove one as 'better' is impossible, if you're being intelligent about it.
 
Maybe it would be better to ask whether it's realistic that Xenos could be a superior gaming accelerator than X1950 in a PC environment?

In that case, hell yeah X1950XTX is vastly superior. On probably every level. More fillrate, more shader power, basically equal shader capabilities, a lot more RAM bandwidth and just a lot more, well, RAM.

Could X1950 function as well as Xenos in a console? I'd say that's a definite yup too. X1950 in a closed environment would scream. It would be even better at HD resolutions than Xenos. Tons of VRAM on a fast bus, tons of fillrate with a full ROP loadout. Better shaders than RSX. However, X1950XTX is way too expensive and complex to drop into a console, methinks.

Xenos is certainly more efficient for what 360 needs. X1950XTX is designed for a much more open environment with more variables that have to be dealt with. Xenos is a fine console GPU, but it would suck as a PC GPU with its limitations. Does it do 2D as well as R580? As optimized as possible for 640x480 up to 2048x1536? A range of AA across all of those resolutions? What about a role as a FireGL and the functions needed for that? Nope.

It's an excellent custom design tailored towards the machine it was put into. And tailored towards the price range it had to fit into.
 
Maybe it would be better to ask whether it's realistic that Xenos could be a superior gaming accelerator than X1950 in a PC environment?

In that case, hell yeah X1950XTX is vastly superior. On probably every level. More fillrate, more shader power, basically equal shader capabilities, a lot more RAM bandwidth and just a lot more, well, RAM.

One area 1950 might be clearly inferior that you're missing is vertex power. Since 360 is unified.

Devs have complained about RSX's vertex abilities versus Xenos, I dont see any reason 1950 would be any different, since it surely has similar vertex capabilities to G71. Both have 8 discrete vertex shaders.
However, X1950XTX is way too expensive and complex to drop into a console, methinks.

I'm not sure that's true either. RSX is basically the X1950XTX's direct PC competitor slightly modified, and it's included in a console. X1950 was a very large die for it's performance level, though. In fact, RSX has 28 pixel shader pipes, versus 24 on the PC side (4 are for redundancy in RSX).
 
Last edited by a moderator:
One area 1950 might be clearly inferior that you're missing is vertex power. Since 360 is unified.

Devs have complained about RSX's vertex abilities versus Xenos, I dont see any reason 1950 would be any different, since it surely has similar vertex capabilities to G71. Both have 8 discrete vertex shaders.


I'm not sure that's true either. RSX is basically the X1950XTX's direct PC competitor slightly modified, and it's included in a console. X1950 was a very large die for it's performance level, though. In fact, RSX has 28 pixel shader pipes, versus 24 on the PC side (4 are for redundancy in RSX).

X1950 runs 2x the bus width and with a higher RAM clock. That's a hell of a lot more board/trace complexity than a little 128-bit bus at a lower clock. Is the board for X1950XT not about the same size as all of 360? That's why I say it's a lot cheaper to implement a Xenos than an R580. I'm also fairly sure that R580 is a much bigger GPU in die-size, especially if you ignore or only partially count that EDRAM (cuz some of the EDRAM die is more functionality for the GPU).

RSX has fewer ROPs than G7x, mainly because more would be a waste on a 128-bit bus anyway. RSX is based on G71 which is actually a pretty svelte GPU compared to R580, too.

Might be true about the vertex power, but I rather doubt it. It probably allows for more efficient use of the whole chip at once. But, using more of it for vertex processing reduces what's available for pixel shader processing. R580 has those 48 pixel shaders just for pixel shading and 8 vertex shaders just for vertex processing.

I think R580 is a lot less efficient than Xenos, especially in Xenos's console environment for which it was custom tailored for. But R580 is a lot bigger with more resources overall. Especially in the RAM and fillrate depts. If you stuck a big 'ol R580 with the same resources it has on a X1950XT board into 360 (somehow lol), I have absolutely zero doubt it would be superior but ridiculously less cost effective or efficient.
 
Last edited by a moderator:
X1950 runs 2x the bus width and with a higher RAM clock. That's a hell of a lot more board/trace complexity than a little 128-bit bus at a lower clock. Is the board for X1950XT not about the same size as all of 360? That's why I say it's a lot cheaper to implement a Xenos than an R580. I'm also fairly sure that R580 is a much bigger GPU in die-size, especially if you ignore or only partially count that EDRAM (cuz some of the EDRAM die is more functionality for the GPU).

RSX has fewer ROPs than G7x, mainly because more would be a waste on a 128-bit bus anyway. RSX is based on G71 which is actually a pretty svelte GPU compared to R580, too.

Might be true about the vertex power, but I rather doubt it. It probably allows for more efficient use of the whole chip at once. But, using more of it for vertex processing reduces what's available for pixel shader processing. R580 has those 48 pixel shaders just for pixel shading and 8 vertex shaders just for vertex processing.

I think R580 is a lot less efficient than Xenos, especially in Xenos's console environment for which it was custom tailored for. But R580 is a lot bigger with more resources overall. Especially in the RAM and fillrate depts. If you stuck a big 'ol R580 with the same resources it has on a X1950XT board into 360 (somehow lol), I have absolutely zero doubt it would be superior but ridiculously less cost effective or efficient.

G71 had a 256-bit bus in less die size than RSX..

R580 is 352mm^2 according to B3d. Xenos is probably ~250MM^2 with EDRAM.
 
G71 had a 256-bit bus in less die size than RSX..

R580 is 352mm^2 according to B3d. Xenos is probably ~250MM^2 with EDRAM.

Die size isn't the consideration for a wider bus; PCB complexity, packaging, and pin-count are. So comparing RSX to G71 on die size is the wrong way to view it; the reasons why both manufacturers didn't go the 256 route was added expense on the motherboard and potential future shrink constraints.
 
Going off on a tangent, it seems to me that rather than stressing over silicon fabrication, whoever can get a breakthrough in board and sundry manufacturing to enable high-end performance on the cheap will have the upper hand. If for example Sony could get RSX on a 256 bit at little more cash than RSX now, they'd have a huge advantage. Going forwards if one console is price constrained to 128 bit bus, and another can double or triple that, the advantage would be huge. We keep hearing about node reductions and all that jazz, but the basic production techniques never get a word. Are they pretty static? Same track laying tech as yesteryear with no room for improvement?
 
Going off on a tangent, it seems to me that rather than stressing over silicon fabrication, whoever can get a breakthrough in board and sundry manufacturing to enable high-end performance on the cheap will have the upper hand. If for example Sony could get RSX on a 256 bit at little more cash than RSX now, they'd have a huge advantage. Going forwards if one console is price constrained to 128 bit bus, and another can double or triple that, the advantage would be huge. We keep hearing about node reductions and all that jazz, but the basic production techniques never get a word. Are they pretty static? Same track laying tech as yesteryear with no room for improvement?

Pretty much. You still put etch the PCB. Only complexity is, with a larger bus you need more pins on the package. That burns silicon up internally (you have to have all kinds of amplification/step-down from bus voltages to CMOS voltage), and with more pins you need slightly more complex trace pathways on the board. 360 has an amazingly complex trace already though to prevent data-bus snooping which would allow the encryption keys to be extracted and mod-chips to be created.

Edit: Internally, you also have more complexity because you have a large bus you probably have a larger word size -> bigger ALU etc.
 
Jarhead,

Were you not able to find this type of info.. pretty much says it all:

RSX
500 MHz G70 based GPU on 90 nm process[1]
300 milllion transistors total
Multi-way programmable parallel floating-point shader pipelines
Independent pixel/vertex shader architecture
24 parallel pixel pipelines
5 shader ALU operations per pipeline per cycle (2 vector4 and 2 scalar (dual/co-issue) and fog ALU)
27 FLOPS per pipeline per cycle
8 parallel vertex pipelines
2 shader ALU operations per pipeline per cycle (1 vector4 and 1 scalar, dual issued)
10 FLOPS per pipeline per cycle
Announced: 74.8 billion shader operations per second theoretical maximum ( ((5 ALU x 24 pixel pipelines) + (2 ALU x 8 vetrex pipelines)) x 550 MHz )
Calculated: 68 billion shader operations per second theoretical maximum ( ((5 ALU x 24 pixel pipelines) + (2 ALU x 8 vetrex pipelines)) x 500 MHz )
Announced: 1.8 TFLOPS (trillion floating point operations per second)
Calculated: 364 GFLOPS ( ((27 FLOPS x 24 pixel pipelines) + (10 FLOPS x 8 vertex pipelines)) x 500 MHz )
8 Render Output units
24 filtered and 32 unfiltered texture samples per clock
Maximum vertex count: 1 billion vertices per second (8 vertex x 500 MHz / 4)
Maximum polygon count: 333.3 million polygons per second (1 billion vertices per second / 3 vertices per tirangle)
Maximum texel fillrate: 12 gigatexel per second fillrate (24 textures x 500 MHz)
Maximum pixel fillrate: 16 gigasamples per second fillrate using 4X multisample anti aliasing (MSAA), or 32 gigasamples using Z-only operation; 4 gigapixels per second without MSAA (8 ROPs x 500 MHz)
Maximum Dot product operations: 33 billion per second
128-bit pixel precision offers rendering of scenes with high dynamic range imaging
128-bit memory bus width to 256-MiB GDDR3 VRAM
Memory clock: 1.3 GHz (650 MHz × 2)[2]
Maximum bandwidth bitrate: 20.8 GB per second
Support for a superset of DirectX 9.0c/API and Shader Model 3.0

Xenos
337 million transistors in total
500 MHz 10 MiB daughter embedded DRAM (eDRAM) framebuffer on 90 nm process
NEC designed eDRAM die includes additional logic for color, alpha blending, Z/stencil buffering, and anti-aliasing
105 million transistors [2]
8 Render Output units
500 MHz parent GPU on 90 nm TSMC process of total 232 million transistors
48-way parallel floating-point dynamically-scheduled shader pipelines[3]
Unified shader architecture (each pipeline is capable of running either pixel or vertex shaders)
2 shader ALU operations per pipeline per cycle (1 vector4 and 1 scalar, co-issued)
10 FLOPS per pipeline per cycle
48 billion shader operations per second theoretical maximum (2 ALU x 48 shader pipelines x 500 MHz)[3]
240 GFLOPS (10 FLOPS x 48 shader pipelines x 500 MHz)[4]
MEMEXPORT shader function
Support for a superset of DirectX 9.0c/API DirectX XBOX 360, and Shader Model 3.0/3.5
16 filtered and 16 unfiltered texture samples per clock
Maximum vertex count: 1.6 billion vertices per second
Maximum polygon count: 500 million triangles per second[3]
Maximum texel fillrate: 8 gigatexel per second fillrate (16 textures x 500 MHz)
Maximum pixel fillrate: 16 gigasamples per second fillrate using 4X multisample anti aliasing (MSAA), or 32 gigasamples using Z-only operation; 4 gigapixels per second without MSAA (8 ROPs x 500 MHz)[1]
Maximum Dot product operations: 24 billion per second
Cooling: Both the GPU and CPU of the console have heatsinks. The CPU's heatsink uses heatpipe technology, to efficiently conduct heat from the CPU to the fins of the heatsink. The heatsinks are actively cooled by a pair of 60 mm exhaust fans that push the air out of the case by negative case pressures.
 
Back
Top