A primer on the X360 shader ALU's as I understand it

Platon · Mar 21, 2006

Guilty Bystander said:
I mean the Xbox 360 is a brand new piece of hardware and yet still almost every game runs on 30fps.

I think you answer your own question. Not only does xbox360 have mutlipolae cores, which will be taking some time for the devs to get used to, but also the GPU is quite a different beast than anything else out there. It will take time, launchand near launch titles are hardly anything to go by when you want to see what the hardware is going for, and even less now, during this gen with the complexity of the hardware...

Asher · Mar 21, 2006

ROG27 said:
Factor SPEs in (sharing in Vertex work) and Xenos may actually be at a disadvantage in this area

Factor in a couple VMX128 units with their new 3D-oriented instructions and RSX/Cell may be at a disadvantage in this area.

No one can say either way, but a lot of people are putting way too much emphasis on SPEs and what they can do for graphics workloads vs VMX128 units. There are instructions on VMX128 units that complete in one or two cycles that take upwards of 10 different instructions on SPEs, for instance.

Asher · Mar 21, 2006

scooby_dooby said:
Funny cause every single casual gamer I've talked to in the last year tells me that PS3 is more powerful. They get that through the media, that gets that through E3.

Yep. Even in generic newspaper articles now talking about PS3, Reuters & Co tend to mention that the PS3 is "twice as powerful as the Xbox 360", probably an allusion to the 1TFLOP vs 2 TFLOP numbers.

The casual gamers seem to be under the impression that the PS3 will be significantly more powerful than the Xbox 360, while the "hardcore" gamers are actually more likely to know the distance isn't huge by any means.

LunchBox · Mar 21, 2006

people seem to forget that Xenos is 3 arrays of 16 pipes... so minimum PER CLOCK would be 16 VS...

LunchBox · Mar 21, 2006

Asher said:
Factor in a couple VMX128 units with their new 3D-oriented instructions and RSX/Cell may be at a disadvantage in this area.

No one can say either way, but a lot of people are putting way too much emphasis on SPEs and what they can do for graphics workloads vs VMX128 units. There are instructions on VMX128 units that complete in one or two cycles that take upwards of 10 different instructions on SPEs, for instance.

AFAIK... Wasn't the VMX units put there to enhance the PPE's FP output... If you use it on graphics... wouldn't it take away from physics and A.I. of the game...

ROG27 · Mar 22, 2006

LunchBox said:
AFAIK... Wasn't the VMX units put there to enhance the PPE's FP output... If you use it on graphics... wouldn't it take away from physics and A.I. of the game...

Yes. That's why a massively parallel architecture like CELL is more flexible (has more independently operating threads) and that is why SPEs can realistically help with graphics.

ROG27 · Mar 22, 2006

Asher said:
Yep. Even in generic newspaper articles now talking about PS3, Reuters & Co tend to mention that the PS3 is "twice as powerful as the Xbox 360", probably an allusion to the 1TFLOP vs 2 TFLOP numbers.

The casual gamers seem to be under the impression that the PS3 will be significantly more powerful than the Xbox 360, while the "hardcore" gamers are actually more likely to know the distance isn't huge by any means.

Those same newspapers said the same things about the XBOX 1 when it first was introduced, as well. The casuals' perceived notion of more powerful is, again, akin to PS2/GC (X360 this round) vs. XBOX (PS3 this round)...which it will likely mirror in reality. They either won't pay attention to or will forget how much more powerful one thing is than another. The only thing that matters to a casual, as far as power is concerned, is this is technically more powerful than that (by what margin does not matter). Only misguided techies on internet forums get hell-bent on meaningless numbers, ironically.

Asher · Mar 22, 2006

ROG27 said:
Yes. That's why a massively parallel architecture like CELL is more flexible (has more independently operating threads) and that is why SPEs can realistically help with graphics.

You're opening up a whole new can o' worms.

Yes, SPEs have more independently operating threads.

There are things SPEs are good at, and things SPEs are, er, not-so-good at.

Realistically, in my opinion, SPEs are not as useful for extra geometry processing compared to Xenon cores for a number of reasons:
1. Xenon cores are individually faster than SPEs are for such processing.
2. Xenon cores can use the L2 cache as a FIFO buffer for Xenos to pull vertex data from, without writing it to RAM. The SPEs need to communicate with RSX through the PPE.
3. Xenon cores natively understand D3D formats

It is not as cut and dry as many people are trying to make it out to be. There are advantages to the Cell approach, and there are advantages to the Xenon approach. In my opinion, Xenon can work more effectively with Xenos for additional geometry processing than Cell can work effectively with RSX for additional geometry processing.

one · Mar 22, 2006

Asher said:
1. Xenon cores are individually faster than SPEs are for such processing.
2. Xenon cores can use the L2 cache as a FIFO buffer for Xenos to pull vertex data from, without writing it to RAM. The SPEs need to communicate with RSX through the PPE.

Those 2 points are false AFAIK.

Gholbine · Mar 22, 2006

one said:
Those 2 points are false AFAIK.

I was about to question them also.

FlexIO is part of the EIB just as all the processing elements are, and the SPEs have DMA. Why would they need the PPE to communicate with the RSX?

Asher · Mar 22, 2006

one said:
Those 2 points are false AFAIK.

They weren't when I was profiling on Cell and Xenon 6 months ago...

On a high-level view (programmer's view), they can dispatch DMA commands for main memory but those are routed through the PPE.

The SPE's individual DMA units are for moving memory between SPEs and the PPE.

Asher · Mar 22, 2006

Gholbine said:
FlexIO is part of the EIB just as all the processing elements are, and the SPEs have DMA. Why would they need the PPE to communicate with the RSX?

Depends how extensive RSX's integration is. I'm assuming they may have to fetch this data from main memory after Cell writes it, but that may not be the case. RSX details are hard to come by.

Even still, I can see it be written out to main memory anyway for the reasons of a FIFO buffer. The chances of Cell sending out the data and having RSX ready to retrieve it are low, or vice versa where RSX asks for the data and waits for the SPE to send it...

Gholbine · Mar 22, 2006

Assuming the RSX can access the SPEs local stores (which I don't think is an unreasonable assumption), what's stopping developers using the local store as a FiFo buffer?

Asher · Mar 22, 2006

Nothing, provided the RSX can access the SPE's LS and the SPE knows enough to lock off the section of LS for the buffer.

superguy · Mar 22, 2006

Scooby..the floating point junk by Sony is hyperbole..but it was more took with and run by the message board fanboi's than anything. It doesn't bother me. Now..I dont recall MS slides pointing out the transistor and instruction count of their GPU like Sony had..so yes Sony has tried to play up the brute strength angle of PS3 a lot more than MS has officially..

And we also had the Major Nelson article from MS. Although, I think it was a REACTION, to all the Sony claims they were putting out there, along with the game media, that it was 100X Xbox360 or something. But again a lot of that did NOT come from Sony..it came from their fans.

So basically I'm saying all considered Sony is off the hook for all that. And playing up Tech specs isn't wrong anyway.

My problem with Sony is a wholescale playing off of FAKE videos at E3 as real. That to me was a huge deal completely unprecedented in video game history for pure dishonesty. So I say be mad at Sony for THAT.

Also from MS side, it's true that ATI has been very aggressive in hyping Xenos and downplaying PS3. That's true. However ATI is not exactly MS either, they are ATI, acting somewhat independantly.

nAo · Mar 22, 2006

Guilty Bystander said:
No not really as the Xenos has to order either Vertex or Pixel Shading in groups of 16 ALU's.
So for example 16 Vertex and 32 Pixel, 32 Vertex and 16 Pixel or 0 Vertex and 48 Pixel.

It really does not matter if it can change its configuration on the fly every 4 cycles, you should have all the granularity you need.

Well I'm quite sure partial precision is always gonna be used on the Xenos as it can't even manage 16bit Floating Point HDR.

What?! Last time I checked it only can't handle blending in that space.

Just think about it the Xenos needs the eDRAM as doing HDR and FSAA on the GPU itself would slow it too a crawl which it already does anyway.

WHAT?!

Even with not rendering HDR and FSAA the Xenos still isn't capable of doing Trilinear Filtering much less Anisotropic Filtering.

ERR!?!?!?

500 Million Vertices per second (1 Vertex instruction x 500MHz) 48 Billion Shader operations/s (96 Shader ops x 500MHz), 8GTexel/s (16 Textures x 500MHz), 4GPixel/s (8 ROPs x 500MHz) and 240GFlop/s (48 ALU's x 10 instructions per ALU x 500MHz) is really quite pathetic.

Are you kidding!?

Compare it to a 7900GTX @550MHz (thinking the RSX is a G70/71 @550MHz) which does 1100 Million Vertices (2 Vertex instructions x 550MHz),

meaningless number

74,8 Billion Shader operations/s (136 Shader ops x 550MHz),

meaningless number

13,2GTexel/s (24 Textures x 550MHz),

this is the only correct thing you quoted..but you know, it's statistical..it is bound to happen

8,8GPixel/s (16 ROPs x 550MHz)

LOL, maybe you want to reconsider your statement about edram here..

and 400,4GFlop/s (24 ALU's x 27 Pixel instructions + 8 ALU's x 10 Vertex instructions = 728 x 550MHz).

other meaningless number

Comparing to the X1900XTX gives pretty much the same results as it does 1300 Million Vertices (2 Vertex instructions x 650MHz), n/a Billion Shader operations/s (as it is unknown), 10,4GTexel/s (16 Textures x 650MHz), 10,4GPixel/s (16 ROPs x 650MHz) and 426,6GFlop/s (48 ALU's x 12 Pixel instructions + 8 ALU's x 10 Vertex instructions = 656 x 650MHz)

same old story..

nAo · Mar 22, 2006

Asher said:
Nothing, provided the RSX can access the SPE's LS and the SPE knows enough to lock off the section of LS for the buffer.

Why would you need to lock off some LS section? it's not a cache, nothing will mess with that section if you don't want to.

Asher · Mar 22, 2006

I'm using the term loosely -- you don't want to trash it is all I'm saying.

Mintmaster · Mar 22, 2006

LunchBox said:
people seem to forget that Xenos is 3 arrays of 16 pipes... so minimum PER CLOCK would be 16 VS...

Throw in time division, and it doesn't matter.

Mintmaster · Mar 22, 2006

It seems we're getting a bit OT here, but I'll continue nonetheless

When I made my comment I was talking about c0_re's coment about POWER. Just because you have more "power" doesn't mean your games will be better. I think PS3 developers will likely be more competent, on average, than XB360 devs, so it still may have better graphics overall, even though IMO RSX has less power.

Regarding my comments about Xenos:

Titanio said:
Not to nitpick, but framebuffer bandwidth is an advantage for Xenos. PS3 has the advantage with every other kind of bandwidth.

In graphics, framebuffer/Z bandwidth is far and away the biggest consumer of bandwidth, especially when you move to HDR and optimize textures for consoles. If RSX is churning out a puny 2GPix/s without alpha blending, it'll use over half its bandwidth once you include Z traffic. Throw in AA, HDR, and/or alpha blending and the situation's even worse. So this affects texture bandwidth as well.

BTW, if anyone doubts the bandwidth issue, check out the B3D 7600GT review. 22.4GB/s all consumed in an ideal simple fillrate test with colour and Z @ 2.9GPix/s. It has a core clocked 31% faster than the 6800GS, 2.6 times the MADD rate, and numerous other improvements. Unfortunately, it only checks in around 15% faster in most games because it has 30% less bandwidth. RSX will pretty much be a 7600GT times two, but with exactly the same bandwidth.

Let me reiterate: If RSX was halved, it would still be significantly hampered by lack of bandwidth!

A primer on the X360 shader ALU's as I understand it

Platon

Asher

Asher

LunchBox

LunchBox

ROG27

ROG27

Asher

one

Unruly Member

Gholbine

Asher

Asher

Gholbine

Asher

superguy

nAo

Nutella Nutellae

nAo

Nutella Nutellae

Asher

Mintmaster

Mintmaster

Similar threads