720p 64bit framebuffer size? Educated guesses here please

ERP, I didn't say anything about blending. Z depths for all the subsamples will be calculated on the parent die during the setup process and tied to the corresponding color value. Color info and 4 z per pixel must cross that bus, and if Z isn't compressed at 4:1, you can overrun bandwidth. Also if the same bus is used during the resolve, then the available bandwidth would be less than 32GB/sec for that time, and again you could become bandwidth limited.
 
Rockster said:
ERP, I didn't say anything about blending. Z depths for all the subsamples will be calculated on the parent die during the setup process and tied to the corresponding color value. Color info and 4 z per pixel must cross that bus, and if Z isn't compressed at 4:1, you can overrun bandwidth. Also if the same bus is used during the resolve, then the available bandwidth would be less than 32GB/sec for that time, and again you could become bandwidth limited.

Your assumptions are incorrect.

It is my understanding, that there is exactly enough bandwidth to send 8 pixels perclock with all the Z compression overhead for the MSAA expansion on the daughter die.

As an aside I believe the 32GB/s is actually a simplification, since there has to be side band information passed for the compression to actually work. They have to send at least 2 DDA values and a coverage mask (4 bits) as far as I can see. Although they could send the DDA values per triangle. But I don't know the specifics of the bus protocol.

I also believe that the resolve bus is seperate, although I don't have any hard info on this.
 
Okay, FSAA as SSAA...

Backbuffer size = 1280x720x12bytes*x4SSAA ~ 42 MBytes

*12 bytes/pixel = 96 bits/pixel = 64bit (FP16, RGBA) + 24bit (Z) + 8bit (Stencil)


Backbuffer bandwidth = 42 MB x 60 FPS x (2 read|write x 5 overdraw)* ~ 24.6 Gbytes/sec

* Squeak, does your definition of overdraw take into account read|writes? If not, then this would be halved to 12.3 GBytes/sec...
 
Shifty Geezer said:
These figures still suggest RSX won't be bandwidth limited...

Won't it?

It all depends on "what's PS3 limited first".

It might be CPU limited first (unlikely), it might be Fillrate limited first (could be), or could very well be bandwidth limited first.

We'll have to see what reaches the limit first then we'll see what it the real bottleneck in PS3. At the moment no one can know.
 
Following talk (Major Nelson) regards bandwidth it was suggested PS3 didn't have enough bandwidth for HDR+AA+High Def. That doesn't seem so here, even with 4x SS instead of MS.

So of course soner or later someone might find they can't do something because they run out of BW, overall it doesn't appear as though PS3 is starved of BW. But I'm still not entirely clear how much BW stuff gobbles up along the rendering pipeline and things like Z tests eat up more (?)
 
Shifty Geezer said:
These figures still suggest RSX won't be bandwidth limited...

This figures don't include fp 16 hdr or fp32 hdr either .


It also doesn't include texture fetch bandwidth

will it be bandwidth limited i dunno , i'm willing to bet it depends on the frame inquestion
 
jvd said:
Shifty Geezer said:
These figures still suggest RSX won't be bandwidth limited...

This figures don't include fp 16 hdr or fp32 hdr either .


It also doesn't include texture fetch bandwidth

will it be bandwidth limited i dunno , i'm willing to bet it depends on the frame inquestion

And how well they play with compression schemes and other bandwidth saving techniques...

I guess we have to wait and see -- it's far too difficult to infer any sort of valid performance number on either Xbox360 or PS3 at the moment -- for all intents and purposes we have about as much useful information on those consoles as we do on the Revolution. ;)
 
bobber its going to be limited by something at some point each frame . Fillrate , texture , pixel or vertex limited . I'm sure that from game to game and scene to scene it will change greatly
 
jvd said:
Shifty Geezer said:
These figures still suggest RSX won't be bandwidth limited...

This figures don't include fp 16 hdr or fp32 hdr either .


It also doesn't include texture fetch bandwidth

will it be bandwidth limited i dunno , i'm willing to bet it depends on the frame inquestion

Yes it includes FP16 HDR. I've included it as 64bit colour above, FP16 RGBA (red, green, blue, alpha).

Adding FP32 (128bit HDR) will result in 20 Bytes/pixel (16 Bytes colour/alpha + 4 Bytes Z/Stencil) instead of 12 bytes/pixel (8 Bytes colour/alpha + 4 Bytes Z/Stencil) for FP16 (64bit HDR). I've outlined the breakdown above.

Bear in mind that no compression has been accounted for or procedurally generated data. Also texture/geometry bandwidths wasn't obviously included but it needs to considered for the NUMA system as a whole, including physics/AI/game data etc. If you're looking for system bottlenecks, it'll be very game dependent...
 
Jaws numbers do include fp16 hdr and shows an absolute worst case scenario because he is reading and writing both color and z for every pixel with absolutely no compression. In practice the number will be significantly lower. Of course texture and vertex bandwidth were omitted because we are looking at framebuffer bandwidth requirements.

Oops. Beat me too it. Damn slow fingers.
 
Rockster said:
ERP, I didn't say anything about blending. Z depths for all the subsamples will be calculated on the parent die during the setup process and tied to the corresponding color value. Color info and 4 z per pixel must cross that bus, and if Z isn't compressed at 4:1, you can overrun bandwidth. Also if the same bus is used during the resolve, then the available bandwidth would be less than 32GB/sec for that time, and again you could become bandwidth limited.

As far as I understand, you'll always have the same Z value for all 4 subsamples of a single fragment (unless some of them have no coverage). This might not be fully accurate as the polygon isn't necessarily paralel to the screen plane, but it could be ignored IMHO.

You'll get different Z values in the framebuffer only, where various samples get combined - if a fragment does not cover every subsample, then you'll get some variation.

I might be wrong though...
 
Jaws said:
Okay, FSAA as SSAA...

Backbuffer bandwidth = 42 MB x 60 FPS x (2 read|write x 5 overdraw)* ~ 24.6 Gbytes/sec

* Squeak, does your definition of overdraw take into account read|writes? If not, then this would be halved to 12.3 GBytes/sec...

I understand this is a worst case scenario but doesn't that make the PS3 CPU bandwidth somewhat limited?

All this Back-buffer stuff it needs to travel on the main CPU bus right? not the Bus to the Ram on the GPU? And the main CPU bus only has about 25gb/s of bandwidth.

So, if it's ballpark 12.3GB/s for the backbuffer bandwidth, doesn't that leave the main bus with only around 13gb/s available?

Or does all this backbuffer data travel to the GPU memory leaving the CPU bus free for everything else?
 
Rockster said:
No, setup generates Z for every subsample position.

But they should be the same, right? So there's no reason to create and transfer the samples until the ROPs... so ATI should be able to only transfer 1 Z value per fragment...
 
Back
Top