Please explain GPU memory bandwidths on XB360 and PS3

Squeak said:
But will that be enough to get consistently over 2x the performance out of the DDR RAM?
When GS has alpha blending off it's already using half the framebuffer bandwith ;) edram buses width support worst case scenario, moreover GS is completely unaware of memory pages and it breaks edram pages all the time drawing 'big' triangles in screen/texture space, there is a lot of headroom there to improve performance.

I was thinking maybe they include 4Mb of some sort of cache that would also be very useful in PS3 mode?
Maybe, but I don't think NVIDIA had the time to customize RSX in a so deeply way.
 
Uttar said:
colinisation said:
By the way Uttar, what do you mean 64GB/link. Whats a link ? You mean a link to the EDRAM pool and it has 4 links (4x64=256GB/sec).
256GB/s is the effective bandwidth assuming compression (according to which current GPUs would benefit from more than 100GB/s too)
You also state the cell will most likely use more bandwith, why is this ? Because it has more threads running in parallel?
Because of the way "cache" works for the SPEs.
I thought the line of thought was the 10MB of EDRAM was used for front/back buffers, this been debunked?
It's used for the Z-Buffer I believe, but there's another thread analyzing this part more precisely, didn't read it entirely yet.

Uttar

In the case of the 256GB/s it's not quite the same as taking compression into account. It's a guaranteed number with 4xAA turned on.
 
ERP said:
Uttar said:
colinisation said:
By the way Uttar, what do you mean 64GB/link. Whats a link ? You mean a link to the EDRAM pool and it has 4 links (4x64=256GB/sec).
256GB/s is the effective bandwidth assuming compression (according to which current GPUs would benefit from more than 100GB/s too)
You also state the cell will most likely use more bandwith, why is this ? Because it has more threads running in parallel?
Because of the way "cache" works for the SPEs.
I thought the line of thought was the 10MB of EDRAM was used for front/back buffers, this been debunked?
It's used for the Z-Buffer I believe, but there's another thread analyzing this part more precisely, didn't read it entirely yet.

Uttar

In the case of the 256GB/s it's not quite the same as taking compression into account. It's a guaranteed number with 4xAA turned on.

That's pretty impressive if true. Would compression be addative on top of this then?

Nite_Hawk
 
But NVidia and ATI desktop GPUs have similar compression for FB/Z so can we say that NVidia's GDDR ram is 88gb/s w/4xFSAA?
 
The GDDR3 in the PS3... 256MB at 700MHz (1400 effective).

What bit/configuration?

I ask because the Xbox 360 has the same effective bandwidth of GDDR3 (like 23GB/s), but it is 128bit. Any ideas?
 
DemoCoder said:
But NVidia and ATI desktop GPUs have similar compression for FB/Z so can we say that NVidia's GDDR ram is 88gb/s w/4xFSAA?

In this case it isn't exactly the same as existing compression. Although I'll agree it has similarities.
 
nAo said:
What worry me most is how RSX is gonna emulate my stencil shadows code ;)
GS has not a stencil buffer, stencil buffer is emulated via alpha blending ops or reusing frame buffer as a texture.
It is something like this...
frame buffer and texture buffer pointing to the SAME location:

KickOneTriangle()
TextureCacheFlush()
KickOneTriangle()
TextureCacheFlush()
etc...

This is pretty fast on GS but it could be very slow on a modern GPU, since it flushes texture cache after drawing any single primitve in the shadow volume.
Interesting. I didn't know that's how shadow volumes were done on the PS2. Thanks for the info.
 
Acert93 said:
The GDDR3 in the PS3... 256MB at 700MHz (1400 effective).

What bit/configuration?

I ask because the Xbox 360 has the same effective bandwidth of GDDR3 (like 23GB/s), but it is 128bit. Any ideas?

GDDR3 for both are 128bit...
 
Jaws said:
Acert93 said:
The GDDR3 in the PS3... 256MB at 700MHz (1400 effective).

What bit/configuration?

I ask because the Xbox 360 has the same effective bandwidth of GDDR3 (like 23GB/s), but it is 128bit. Any ideas?

GDDR3 for both are 128bit...

Any guess why X360 512MB has the same bandwidth as the PS3 256MB? I believe they both attain 23GB/s. I could be wrong (I mistake things at times) but I thought that how many memory modules you had would impact bandwidth. Does this imply Sony is using more memory modules to attain the same bandwidth or am I all goofy on this point?

Thanks for your time Jaws.
 
3dcgi said:
Interesting. I didn't know that's how shadow volumes were done on the PS2. Thanks for the info.
To be fair there are better ways to emulate a stencil buffer.
If you can afford some extra space on VRAM (I can't when I coded that stuff..) you can setup a 16bit color buffer as a stencil buffer, and you can increment or decrement it via alpha blending ops once you set color clampling off (add 8 to increment and add 248 to decrement ;) )
In this way you don't need to flush texture caches since you're not using texturing at all..2.4 Gigapixel/s fillrate with untextured primitives is cool ;)
 
jvd said:
Jaws where does the fsb come into play on the x360

X360

-FSB

Northbridge on-die GPU or seperate chip is not clear from 'leak'.

21.6 GB/s for X360 CPU


PS3

-FSB

Northbridge on-die CELL

25.6 GB/s for CELL (not sure if CELL can read/write to GDDR3 also)


@ Acert93,

It's the no. of channels and their widths that can vary to form the above aggregate 128bit bus for GDDR3.

E.g. 2*64bit or 4*32bit 'channels' to form a 128bit aggregate bus. And compatible DRAM chips on those channels to form the total 'amount' of RAM.
 
Okay so jaws .

The x360 cpu has to go through the gpu to acess that ram . But the cpu would require less bandwidth and data than the gpu correct ?

Where as to acess more than 256 megs of ram the gpu in the ps3 would have to go through the cpu . So i terms of graphics wouldn't that second pool or ram be slower ? Much slower actually ?
 
jvd said:
Okay so jaws .

The x360 cpu has to go through the gpu to acess that ram . But the cpu would require less bandwidth and data than the gpu correct ?

Where as to acess more than 256 megs of ram the gpu in the ps3 would have to go through the cpu . So i terms of graphics wouldn't that second pool or ram be slower ? Much slower actually ?


Generally CPU's require SIGNIFICANTLY less bandwidth than a GPU.

In general purpose code, assuming you have a half decent cache architecture then if your continually banging on external memory with your CPU your doing something really wrong.

Having said that if you were to use CELL/XCPU for vertex work then this changes the picture somewhat since you basically working with streamed data. But even if you were to totally saturate the tri setup engine with CPU generated verts it would still be a fraction of the total memory bandwidth.
 
Generally CPU's require SIGNIFICANTLY less bandwidth than a GPU
Right but you still have to go through the cell chip to acess the ram which will add latancy would it not ? which would slow down the speed of which you can acess stuff .

I'm just trying to figure out what we can expect for games that use more than 256 megs of ram on the ps3 for textures
 
jvd said:
Okay so jaws .

The x360 cpu has to go through the gpu to acess that ram . But the cpu would require less bandwidth and data than the gpu correct ?

Where as to acess more than 256 megs of ram the gpu in the ps3 would have to go through the cpu . So i terms of graphics wouldn't that second pool or ram be slower ? Much slower actually ?

In addition to the above 25.6 GB/s I used for CELL FSB, I need to add the FlexIO to that figure too,

CELL FSB ~ 25.6 + 35 ~ 60.6 GB/s

X360 CPU FSB ~ 21.6 GB/s

So in isolataion, the CELL clearly has room to breath.

For GPU's in isolation from 'leak',

RSX ~ 22.4 + 35 ~ 57.4 GB/s

R500 ~ 33.2 + 22.4 ~ 55.6 GB/s

Here, they both are similar in I/O bandwidth to breath. However, the R500 has the 48 GB/s EDRAM (256 GB/s effective) to breathe more. This should go someway to alleviate any bottleneck on the GPU with it's UMA design. However, the NUMA design for PS3 has it's inherent advantages with fewer bus contention issues.

Ultimately, it's UMA vs NUMA with their own quirks...
 
But u can't just add two buses together .

RSX ~ 22.4 + 35 ~ 57.4 GB/s

How do you get this ? Its not a true 57.4gb . It can acess the ram at 22.4 and the cell at 35 . But then the cell still needs to acess the ram .
 
Here, they both are similar in I/O bandwidth to breath. However, the R500 has the 48 GB/s EDRAM (256 GB/s effective) to breathe more. This should go someway to alleviate any bottleneck on the GPU with it's UMA design. However, the NUMA design for PS3 has it's inherent advantages with fewer bust contention issues.

It's all swings and roundabouts.

I've spent a lot of time over the years trying to guess the performance of a piece of hardware from published specs and I've guessed the wrong bottleneck more times than the right one.

Until you actually benchmark the hardware and see how it behaves you simply can't predict performance.

Here's an example (purely hyperthetical)

Lets assume that for 1 thread on 1 processor X360 is faster than Cell on either the PPU or SPU.

That doesn't tell us anything about how the system will perform when that task is spread across say 6 threads, the shared L2 on X360 might get thrashed, Cell's SPU's might end up with serious DMA contention. There is absolutly no way to predict.

Because PS3 performs task A better than X360, doesn't mean that X360 won't perform task B better than PS3. Which tells us nothing about performance in an application that requires ABC and D.

My current guess from the published specs would be that they are a lot closer in performance terms and featureset than say XBox and PS2, but that doesn't preclude one or both have serious none obvious bottlenecks.
 
Back
Top