"4xAA": Microsofts big mistake.

Status
Not open for further replies.
Jaws said:
explanation

Yes, the idea that RSX can be treated as just another SPE is simple enough, but what's not so obvious is exactly how much advantage this gives. I mean, RSX isn't exactly just another SPE now is it? That's one hefty SPE, weighing in at over 300 million transistors or whatever. So okay, some of the SPE's can exchange data with RSX and it all falls within the "dynamic data doesn't stay in one place very long" paradigm, but come on, how much VRAM bandwidth are you expecting to save here?

Jaws said:
I suggest you read my replies above because I don't think you understand that AA is a function of system bandwidth and it's relevance to this topic.

:oops:

I don't think either you nor I are in a position to know exactly how much "system bandwidth" can be optimised in an in-game scenario on PS3 and what exact effect this might have on AA. It's all just loose speculation.
 
Jaws said:
Acert93 said:
Jaws said:
That RSX 57 GB/sec read|write is as real as Xenos 54 GB/sec read|write (22.4 system + 32 to EDRAM module). However, in X360's case, the XeCPU under those conditions cannot access system RAM but CELL can still access system RAM (XDR).

That is totally silly

SEE SIG. YOU KNOW WHY. DON'T MAKE THIS ARRANGEMENT MORE DIFFICULT. :rolleyes:

What arrangement?

There is not arragement. You blocked me, I did not block you. The only logical arragement is you continue ignoring me and I continue to excersize my right to post as I like on the forums within the rules of the B3D forum. The only "arrangement" I see being broken is your desire to ignore me. Threats and fits in PM are not going to stop me from either enjoying your posts (which I frequently do) or challenging them when I think there are incorrect presumptions. Referencing RSX bandwidth as being relevant to total system bandwidth, within the confines of our discussion, is very much a point that I believe should be challenged and explored. So don't tell me how to post.

The irony of all of this is your claims of why you put me on ignore. Basically my posts are irrelevant to your point, and off topic. I have challenged that and you were rude in reply. So whatever, it is best you ignore me. What is ironic is that your comparing total system bandwidth, and not main memory bandwidth which is at the core of the issue IN THIS THREAD, is blurring the actual discussion at hand. Yet I put in a thoughtful reply... not some threatening PM about how I am going to block you.

Since you blocked me (not not vice versa) I would suggest continueing ignoring my posts and I can continue posting as I please. You are not a mod and you have no right to tell me how to post. So shove off.

Now if JVD or Sonic would be kind enough to forward this I would appreciate it. If Jaws wants to talk civilized in the forums that is fine, but threats and insinuating I have some type of agreement with him is childish. He would be best off just ignoring me and letting me enjoy the forums as I have the right to do.

Jaws said:
People keep confusing the RSX peak read|write B/W of 57 GB/sec as B/W to XDR. It's NOT. It's B/W to FlexIO + GDDR AND CELL can still access XDR. This I've mentioned several times in this thread and it becomes frustrating when people keep ignoring/not understanding what's already been posted. EIB below...

What is annoying is you are ignoring the point that PS3 does not have 57GB/s of memory bandwidth.

XDR = ~25GB/s
GDDR3 = ~22GB/s
PS3 total memory bandwidth = ~47GB/s

The 57GB/s is off topic to the specific discussion at hand, i.e. System Bandwidth and AA.

If you want to count the EIB then we need to start counting the XENOS<>XENON 22GB bidirectional bandwidth, eDRAM 256GB/s bandwidth, etc... everything I already explained in post you cannot read :rolleyes:

This is the same stuff people got on Major Nelson for. Counting bandwidth where it is irrelevant (i.e. the eDRAM in the 360 is NOT main system memory and should not be counted as total system bandwidth because it is specialized to serve for the backbuffer).
 
I *think* Jaws is counting SPE LS as RAM available for backbuffer work. That is, RSX can access SPE LS at 35 GB/s for the same sort of work that Xenos can access daughter die at 32 GB/s for. I think that results in RSX using 22 GB/s to DDR, 35 GB/s to Cell, and Cell has 25 GB/s to XDR. The equivalent XB360 numbers would be 22 GB/s shared DDR between XeCPU and Xenos, 32 GB/s to daughter die.

I still can't see Cell's role in creating the backbuffer though, or having it's LS accessed by RSX for use in backbuffer. Perhaps the idea is Cell creates material directly into RSX (accessed from SPE LS), bypassing the transfer of data through main memory. This means 25 GB/s to Cell from XDR, 22 GB/s for use as backbuffer and textures by RSX, and geometry and procedural textures worked over 35 GB/s EIB. In the argument of AA, this EIB bandwidth counts if used this way, as it frees up RAM bandwidth.

eg. Where each arrow shows consumption of main RAM BW.

Without EIB = 5 arrows

Code:
Cell --- create geometry -> RAM (XDR or DDR) -> RSX -> Backbuffer -> Front buffer
                                                 ^
                                                 |
                                       Textures -+


With EIB = 3 arrows

Code:
Cell --- create geometry --- RSX -> Backbuffer -> Front buffer
                              ^
                              |
                    Textures -+

This setup negates a portion of Main RAM bandwidth consumption as the data need to feed RSX is transferred directly, thus freeing up more system RAM bandwidth for use by RSX in adding AA. A comparison with XB360 would be (3 arrows)...

Code:
XeCPU --- create geometry -> Xenos --- Backbuffer -> Frontbuffer
                              ^
                              |
                    Textures -+

This makes sense, the BW from Cell's EIB constitutes a valid part of the rendering process and contributes to the overall BW available for rendering, so I guess this is what Jaws is talking about.

What isn't shown above is read/writes to backbuffer for processing/blending, which would be another arrow on the PS3 BW diagrams, but which internalised in Xenos' daughter die, which is where it's saving comes into play (plus 16 GB/s read from eDRAM to Xenos). Also the arrows don't show what weighting each part plays in BW consumption. If geometry creation from Cell makes up only 30 MB/s, having 35 GB/s from EIB is no great bonus. Likewise if backbuffer blending only consumes 50 MB/s, having 256 GB/s on eDRAM module is no big thing. Frontbuffer is shown but we know that's only a small consumer of BW. Without these weightings we only have a rough summary of where 'leaks' are, not how big they are.

Still, in context of the argument, it seems correct to me to include Jaws' assertion that EIB contributes to render bandwidth, as I've explained (to myself!) above.
 
@JAWS

Okay, I'd like to summarize.

Okay, the comment about the "frame buffer in local storage" was just a little joke, because I don't see the amount of savings to RSX's GDDR3 due to communication between it and Cell as being particularly significant, without some kind of direct access to the XDR RAM. If you do access XDR, then that has its own consequences. You, however, contend that VRAM access can be significantly saved because RSX will be sending data (to be operated on by SPE's) to Cell instead of otherwise putting that into the RSX's own 256MB pool. This saving in access to VRAM will allow AA to be achieved (even in 1080p?) That's fair enough, but unfortunately we can't quantify how significant that saving would be without actually working on an advanced engine for PS3. As such, neither you or I are right or wrong, but are both just speculating based on limited information. You are not the fool my comment might imply and I am not as ignorant as you might imply. We are just looking at the same picture from different angles (through a window that is rather dirty).

PS, please don't put me on ignore or block or whatever :cry:

:edit:

Or we could just refer to the diagrams that Shifty Geezer has whipped up while I was typing up this post. Nice explanation Shifty!
 
1. If you are going to count the LS memory on each SPE then you must count the Xenos<>Xenon ~22GB/s bidirectional L2 cache bandwidth and the eDRAM_Logic to eDRAM bandwidth (256GB/s). You cannot begin counting subsystem resources on one system and not the other.

Is that not the same reasoning that got Major Nelson slammed on these forums? I am certain it is.

I believe most of us agreed it was fair to count the eDRAM as *bandwidth saved* (i.e. backbuffer Z, Alpha, AA, etc) and not just randomly "Lets play with numbers to make my favorite system seem the most 1337".

2. Further, even *if* SPEs RAM could be used this way it wont be used for this. For the extra bandwidth (if it is even usable... this is all paper conjecture in regards to the SPEs being used this way) you totally neuter your CPU.

Can anyone honestly say AA is sooo important they are willing to sacrifice the SPEs and be left with a single PPC core with 512K cache? You would be totally CPU limited and there would be no need for the extra bandwidth for AA. Basically we are talking about cripplying a 218GFLOPs monster to a ~40GFLOPs in order PPC with 512K cache--for a measly 10GB/s for AA.

If developers are going to burn SPEs so they can use the LS for system memory bandwidth the PS3 is a total crap design (which it of course is NOT!!).

@ Shifty: We get along pretty well (even though we see things differently). Maybe you can explain how counting the SPE LS (256K * 7) as main memory bandwidth is any different from Major Nelson counting the eDRAM as system memory? With the 360 we were talking about 10MB with 256GB/s of bandwidth, with PS3 we are talking about 1.75MB with 10GB/s. These are very specialized memory pools with specific tasks in mind. eDRAM is not for general processing and SPEs were not designed as tiled framebuffers.
 
Acert93 said:
Maybe you can explain how counting the SPE LS (256K * 7) as main memory bandwidth is any different from Major Nelson counting the eDRAM as system memory?
This isn't about counting the BW on SPE LS (if it were, the bandwidth between SPU's and LS is astronomical ;) ). If I'm right in understanding Jaws he's not talking about using the combined Cell SPE's LS as BackBuffer cache in the same way the eDRAM is. He's talkng about bypassing the use of main memory BW by outputting 3D data straight from CPU to GPU through a seperate set of pipes.

The chief distinction between this and the infamous Major Nelson paper on Xenos is the broad representation of the internal BW figure as additional to the entire render process, instead of it's true nature which is only part of the render process. If I try knocking up a table, though don't count on it to be accurate - only representative...

For PS3...
Code:
Render Phase            Where Occurs         Bandwidth Consumed

Create Geom                Cell              Internal (xxx GB/s)
Pass geom to GPU           EIB               EIB (35 GB/s)
V and P shading            RSX               Internal (xxx GB/s)
Write to backbuffer        RAM               DDR/+XDR (22/47 GB/s)
Process Backbuffer         RSX/+Cell?        DDR/+XDR (22/47 GB/s)
Write FrontBuffer          RSX               DDR/+XDR (22/47 GB/s)

For XB360...
Code:
Render Phase            Where Occurs         Bandwidth Consumed

Create Geom                XeCPU             Internal (xxx GB/s)
Pass geom to GPU           RAM               DDR (22 GB/s)
V and P shading            XeGPU             Internal (xxx GB/s)
Write to backbuffer        Xenos             Smart-eDRAM interconnect (35 GB/s)             
Process Backbuffer         Smart-eDRAM       Internal (xxx GB/s)
Write FrontBuffer          RAM               DDR (22 GB/s)

In the case of PS3, main RAM BW is thrashed in 3 of the steps (actually I missed texture fetches, so V & P shading also accesses RAM in both systems), including BackBuffer work which is often and costly from my limited understanding.

In the case of XB360, main RAM BW is thrashed in 2 steps, getting data to the GPU and output (ignoring textures :oops: ).

If the EIB didn't exist, PS3 would also have to intrude on RAM BW for passing data to the GPU, so there is a saving made of up to 35 GB/s.

The available 256 GB/s BW on eDRAM is internal storage for local processing, that works on a subset of the rendering process. Figures for other phases of the rendering process on local storage, like on the GPU or CPU, were not included in the MN article. ie. Geometry creation occurs between CPU logic circuits and local storage, shading occurs between GPU shader logic and local storage. Yet the figures for these phases where logic acts on local storage weren't present in the MN article. The bandwidth saved by moving the backbuffer processing onto a chip with fast local store is important, but no more than the bandwidth saved from the GPU working on internal registers and local stores instead of working directly on the system RAM!

It was the highlighting of an internal processor bandwidth, between logic and local storage, that is off with the MN article as it's something not used generally (never heard of it before now!) and wasn't used across the board uniformly for both entire systems. If they want to do this they should include ALL bandwidths between logic and local storage.

Whereas Jaws is talking about transfer of data from one processor to another, which is a non-local storage BW figure that can be counted along with all the others.

That's the chief distinction, at least that I make. Don't count BW between logic and local storage on the same chip, only count BW where data is passed from one processor to another.
 
Shifty Geezer said:
The bandwidth saved by moving the backbuffer processing onto a chip with fast local store is important, but no more than the bandwidth saved from the GPU working on internal registers and local stores instead of working directly on the system RAM!

It was the highlighting of an internal processor bandwidth, between logic and local storage, that is off with the MN article as it's something not used generally (never heard of it before now!) and wasn't used across the board uniformly for both entire systems. If they want to do this they should include ALL bandwidths between logic and local storage.

Whereas Jaws is talking about transfer of data from one processor to another, which is a non-local storage BW figure that can be counted along with all the others.

That's the chief distinction, at least that I make. Don't count BW between logic and local storage on the same chip, only count BW where data is passed from one processor to another.

That doesn't make any sense to me frankly.

The 256 GB/s EDRAM bus is between the xenos ROPS and the tiled backbuffer.

It's pretty much where a normal GPU ends (the ROPS) and it's allocated rendertarget memory begins, IE the external memory bus on a conventional GPU.

In fact, on a conventional GPU, the connection between the shaders and the ROPS is an internal bus. The only reason you guys are considering it as an external bus on xenos is simply because xenos is a multi-chip module.

Do you count Pentium Pro's bus between it's L2 cache and processor chip as an external or internal bus?
 
Merketing in terms of advertisment/PR is one thing, but eventually a product must stand on its own. I think MS/ATI made a very good decision with the EDRAM to allow "FSAA for free". I don't really see how this decision can come back to haunt them. On the marketing side they may have made an error in how they are presenting it, but this is just for rolling up the first snowballs for the fight. Ultimately we will all see the differences and choose based on that or some other factors (like what games we like, which I think is much more important).

I realize the OT is about the marketing of the feature, but I would find it very strange if anyone found fault with the actual implementation. It actually worries me a bit that Sony is not using EDRAM, but I trust them in their decisions: they seem to have done alright for themselves before.
 
I think Sony/PS3 had fewer options when it came to its GPU implementation and design. Their homegrown CELL hybrid wasn't going to be competitive or popular with developers. ATI was already busy with both competitors. NVidia was their only real viable option. I think NVidia knew they were in a solid position and had no real desire to invest a lot of time into a custom design. The PC space is their core market and competency, and last time they fell behind in that arena after significant engineering investments for the console market. It doesn't seem likely that they would be willing to divert significant resources away from their bread and butter again, if they didn't have to. This meant Sony didn't have realistic options to implement UMA or eDram type designs.

IMO, they just "ended up" with a system design more than a meticulously planned one, targetting specific resolutions, AA modes, etc. FlexI/O was designed around multi-chip Cell configurations and not a high speed GPU interface. So this whole CELL + RSX synergy, 1080p, 128-bit HDR, and the rest is the work of great marketing and industry respect. Now developers are going to have to bail Sony out, by working harder, recoding for SPE's, and trying to manually balance the storage and distribution of audio, textures, geometry, and back-buffers against available bandwidths, memory sizes, and latencies.

The whole idea of summing selective bandwidths for indescript tasks to ascertain system performance is not something that I think will ever achieve consensus. It does however make for interesting conversation.
 
I understand what jaws is saying but the question is how much bandwidth does geometry fetching take up? What's the main contributor of bandwidth usage in a traditional design? Will the cell be solely responsible for generating geometry, if so what will be the source of this geometry?
 
Rockster said:
NVidia was their only real viable option. I think NVidia knew they were in a solid position and had no real desire to invest a lot of time into a custom design. The PC space is their core market and competency, and last time they fell behind in that arena after significant engineering investments for the console market. It doesn't seem likely that they would be willing to divert significant resources away from their bread and butter again, if they didn't have to. This meant Sony didn't have realistic options to implement UMA or eDram type designs.
Exactly, it seems that Sony was where Microsoft was last gen in the sense that they needed something quick and simple to implement to get the product out there which isn't necessarily a bad thing since imo nvidia's chip in the xbox was the best performing of all the chips although the bandwidth left something to be desired. Time will tell who made the right choices this cycle.
 
I get the impression that Sony has been so singularly focused on CELL, and filling the world with them, they lost sight of the entire platform. Not that they were in a hurry. I mean look at the time, money, and energy invested in that CPU, it's a shame that the rest of the system and supporting software haven't received the same attention. They don't even seem to know what they have built. It's a supercomputer for internet and e-mail using linux that also plays games? Luckily they have a huge fan base that doesn't seem to care. For Sony, the adage holds true, if you build it, they will come. And with an install base like that, the publishers crack their whips, and developers ride off to the rescue.
 
Rockster said:
I get the impression that Sony has been so singularly focused on CELL, and filling the world with them, they lost sight of the entire platform. Not that they were in a hurry. I mean look at the time, money, and energy invested in that CPU, it's a shame that the rest of the system and supporting software haven't received the same attention. They don't even seem to know what they have built. It's a supercomputer for internet and e-mail using linux that also plays games? Luckily they have a huge fan base that doesn't seem to care. For Sony, the adage holds true, if you build it, they will come. And with an install base like that, the publishers crack their whips, and developers ride off to the rescue.

You are not alone with that impression, I'm sure.
 
Rockster said:
I get the impression that Sony has been so singularly focused on CELL, and filling the world with them, they lost sight of the entire platform. Not that they were in a hurry. I mean look at the time, money, and energy invested in that CPU, it's a shame that the rest of the system and supporting software haven't received the same attention. They don't even seem to know what they have built. It's a supercomputer for internet and e-mail using linux that also plays games? Luckily they have a huge fan base that doesn't seem to care. For Sony, the adage holds true, if you build it, they will come. And with an install base like that, the publishers crack their whips, and developers ride off to the rescue.

The biggest marketing points of the PS3 is Cell..considering the amount of time and money they invested on Cell and with Cell being the basis of their future platforms, it is only logical to understand that Sony is heavily focused on promoting Cell as much as PS3 itself. Sony pretty much did what they planned with PS3, which is inclusion of Cell tech and Blu-Ray drive..I really don't understand the idea of Sony being lost site of the entire platform. We are looking at Cell thanks to Microsoft lol, if there were no MS, then we would have seeing crappy EE3 today :LOL:
 
Rockster said:
I get the impression that Sony has been so singularly focused on CELL, and filling the world with them, they lost sight of the entire platform. Not that they were in a hurry. I mean look at the time, money, and energy invested in that CPU, it's a shame that the rest of the system and supporting software haven't received the same attention. They don't even seem to know what they have built. It's a supercomputer for internet and e-mail using linux that also plays games? Luckily they have a huge fan base that doesn't seem to care. For Sony, the adage holds true, if you build it, they will come. And with an install base like that, the publishers crack their whips, and developers ride off to the rescue.

A brilliant summary in my opinion.

Why do I need Cell to check my email or surf the net? I did that on a Pentium 133mhz.

Also, I've no doubt Sony will have some of the best looking and best playing games this gen, though I have a feeling that very little of that will be because of the hardware. Having said that, I will most likely get a PS3 as my only console next gen (unless I splash out on more than one), because the devs will go where the money is.
 
I don't think so, as 720p seems to be a well balanced target. It's a progressive scan format, and has less than half the number of pixels as 1080. So with GPU's of approximately equivalent shading power, that means you can do twice as much work per pixel. And the addition of 4xAA gives an effective edge resolution of 2560x1440 which is better than 1080. Plus it's the native format of the majority of digital displays that actually match up to a standard resolution. It hurt them in the battle of power point presentations, but will likely result in better over all image quality.
 
PARANOiA said:
Also, I've no doubt Sony will have some of the best looking and best playing games this gen, though I have a feeling that very little of that will be because of the hardware.

So the actual hardware as a whole is the PS3 bottleneck now? :)
 
Status
Not open for further replies.
Back
Top