Please explain GPU memory bandwidths on XB360 and PS3

Shifty Geezer · May 17, 2005

Can someone please clear up this uncertainty. Does the XGPU share it's UMA RAM bandwidth (25 Gb/s?) with the CPU? Does it have any bandwidth advantage over PS3 which has lower bandwidth to DDR3, but can access XDR too? Is the XGPU dependant on limiting certain factors to the eDram and if so, will this result in less available textures/models/blah blah when rendering?

I'm not educated enough to appreciate the role eDRAM plays (other than everyone wants it!)

Arun · May 17, 2005

X360: 512MB of 22.4GB/s GDDR3 (700Mhz) shared between all system components. All data passes by the North Bridge, which has access to the RAM, the R500 and the South Bridge. The R500 has a 64GB/link to a separate chip consisting of 10MB of EDRAM and the necessary access logic.

PS3: 256MB of 25.6GB/s XDR (800Mhz) shared between all system components. All data passes by the CELL bus, which has access to the RAM, the RSX and the South Bridge. The RSX has access to an additionnal and dedicated 256MB of 22.4GB/s GDDR3 (700Mhz). Also, the shared memory has a theorical maximum of 20GB/s from the CELL bus to the RSX and 15GB/s the other way around. This should however not be a real limitation.

That means that the dedicated bandwidth of the X360 to the GPU is higher, but the shared memory will have to be used significantly more. On the other hand, CELL also uses more memory bandwidth than the X360's CPU. The X360 would most likely have the bandwidth advantage, but it's hard to say by how much without a clear idea of exactly how the EDRAM is used (the "4x efficiency" makes me assume it's exclusively related to the Z-Buffer, but I'd prefer not to assume too much at this point)

Uttar

PiNkY · May 17, 2005

Well it doesn't make much sense to put EDRAM into a seperate chip, if it doesn't include any logic.

ERP · May 17, 2005

FWIW as far as I know (and I don't have any concrete info on the physical configuration) the EDRAM is on die.

If it was external I would expect a much larger chunk than 10MB's.

The bandwidth is picked so that it is NEVER the bottleneck. Fill rate is entirely predictable.

ninelven · May 17, 2005

erased

DemoCoder · May 17, 2005

DaveBaumann is claiming it is not in the same core as the R500. The bandwidth may not be the bottleneck, but then again, it only has 8 pipes and it's only 32gb/s each way. So instead of bandwidth bound, it might be fillrate bound. On the other hand, it looks like the bandwidth was picked not to be the bottleneck in non-HDR settings.

jvd · May 17, 2005

Doesn't the gpu in the ps3 have to acess the cell memory pool through the cell ? Which will add latancy and lower response time as it has to go from the ram to the cell to the gpu and all the while the cell will be using that bandwidth for its own needs ?

ERP · May 17, 2005

I agree the 8 pipes is a somewhat interesting decision, but if you figure that the XGPU can saturate it's EDRAM bandwidth with 8 pipes, how will the PS3 one fair with more pipes and less bandwidth.

I doubt fill will be the issue for the most part, although I have to say the bandwidth figures for the NVidia part worries me if Sony are really going to dictate 1080p.

DemoCoder · May 17, 2005

I have a feeling that the situation with 1080p will be much like the situation with 720p on X-Box1. Only a few games will support it.

I think the sweep spot for both consoles is 720p.

It will be interesting to see if RSX includes any support for HDR-AA and compression, otherwise, FP32 backbuffers (and even FP16) will still be a big performance bottleneck.

colinisation · May 17, 2005

jvd said:
Doesn't the gpu in the ps3 have to acess the cell memory pool through the cell ? Which will add latancy and lower response time as it has to go from the ram to the cell to the gpu and all the while the cell will be using that bandwidth for its own needs ?

This is correct, but I beleive this is true of all UMA arches.

By the way Uttar, what do you mean 64GB/link. Whats a link ? You mean a link to the EDRAM pool and it has 4 links (4x64=256GB/sec).

You also state the cell will most likely use more bandwith, why is this ? Because it has more threads running in parallel ?

I thought the line of thought was the 10MB of EDRAM was used for front/back buffers, this been debunked ?

Squeak · May 17, 2005

Still, one huge question remains.
How are they going to get the 48Gbs bandwidth needed to emulate the GS?

jvd · May 17, 2005

This is correct, but I beleive this is true of all UMA arches.

22.4 GB/s memory interface bus bandwidth
* 256 GB/s memory bandwidth to EDRAM
* 21.6 GB/s front-side bus

Sounds to me like the xgpu can acess the ram through its own 22.4gb memory interface ?

While the xcpu can acess the ram with its fsb at 22.6 gbs ?

unless i'm wrong in my understanding

Jawed · May 17, 2005

What about if Sony says that there are two resolutions:

- 720p with upto 4xAA
- 1080p with no AA

Jawed

j^aws · May 17, 2005

[X360: CPU+GPU]----256 GB/s* ----[10 MB]
|
|
22.4 GB/s
|
|
[512 MB]

[PS3: CPU+GPU]----22.4 GB/s ----[256 MB]
|
|
25.6 GB/s
|
|
[256 MB]

I'm sure they'll both have their pros and cons, but IMHO, the PS3 has a better overall balance. The X360 is a hybrid UMA design and the PS3 is a hybrid NUMA.

*Note: 256 GB/s is effective bandwidth, real not confirmed AFAIK but 'leak' ~ 32 GB/s write and 16 GB/s read.

jvd · May 17, 2005

Jaws where does the fsb come into play on the x360

Jawed · May 17, 2005

I was under the impression that the northbridge for XB360 is integrated into R500 - but I'm not sure of that...

The leak shows a separate link twixt CPU and Nothbridge (GPU) with 10.8GB/s bandwidth. Plainly vertex data will come straight out of the CPU into the GPU.

What's the rate for vertex data? The leak implies 1 vertex per clock. How many bytes per vertex? 10? 20? 20 bytes is 10GB/s.

Jawed

colinisation · May 17, 2005

I havent seen system diagrams or been keeping up,exams and stuff , so this might be wrong. I think the northbridge on the X360 takes a cannection each from both the GPU and CPU and balances the ~23GB/s b/w them as necessary. The FSB is simply the CPUs connection to the north bridge. (reason its speed is below real bandwith is probably because of target chip clock - multiplierxFSB=CPU clock speed)

The Xbox 1 was the same I beleive the the north bridge essentially being on the GPU and the CPU FSB connecting CPU to GPU.

Only ~23GB/s bandwith to each GPU man them things are gonna starve.

nAo · May 17, 2005

Squeak said:
Still, one huge question remains.
How are they going to get the 48Gbs bandwidth needed to emulate the GS?

With more efficiency, RSX has a number of bandwith saving features GS does not have.
What worry me most is how RSX is gonna emulate my stencil shadows code

GS has not a stencil buffer, stencil buffer is emulated via alpha blending ops or reusing frame buffer as a texture.
It is something like this...
frame buffer and texture buffer pointing to the SAME location:

KickOneTriangle()
TextureCacheFlush()
KickOneTriangle()
TextureCacheFlush()
etc...

This is pretty fast on GS but it could be very slow on a modern GPU, since it flushes texture cache after drawing any single primitve in the shadow volume.

Arun · May 17, 2005

colinisation said:
By the way Uttar, what do you mean 64GB/link. Whats a link ? You mean a link to the EDRAM pool and it has 4 links (4x64=256GB/sec).

256GB/s is the effective bandwidth assuming compression (according to which current GPUs would benefit from more than 100GB/s too)

You also state the cell will most likely use more bandwith, why is this ? Because it has more threads running in parallel?

Because of the way "cache" works for the SPEs.

I thought the line of thought was the 10MB of EDRAM was used for front/back buffers, this been debunked?

It's used for the Z-Buffer I believe, but there's another thread analyzing this part more precisely, didn't read it entirely yet.

Uttar

Squeak · May 17, 2005

nAo said:
Squeak said:

Still, one huge question remains.
How are they going to get the 48Gbs bandwidth needed to emulate the GS?

Click to expand...

With more efficiency, RSX has a number of bandwidth saving features GS does not have.

But will that be enough to get consistently over 2x the performance out of the DDR RAM?
I was thinking maybe they include 4Mb of some sort of cache that would also be very useful in PS3 mode?

Please explain GPU memory bandwidths on XB360 and PS3

Shifty Geezer

uber-Troll!

Arun

Unknown.

PiNkY

ERP

ninelven

PM

DemoCoder

jvd

ERP

DemoCoder

colinisation

Squeak

jvd

Jawed

j^aws

jvd

Jawed

colinisation

nAo

Nutella Nutellae

Arun

Unknown.

Squeak

Similar threads