New article on Xbox360 GPU

Jawed said:
You need to read that ArsTechnica article linked above.
I did so when it was published, and I'm well aware of the concept of procedural synthesis. In fact I wrote a fractal tree generator once ;)

Jawed said:
The bottleneck you allude to would be arrived at simply by entirely ignoring the procedural synthesis feature that links Xenon and Xenos - it's there to save vast amounts (10s of GB) of bandwidth. 10.8GB/s from CPU-GPU is the equivalent of 21.6GB/s CPU-RAM-GPU.
Sure, but while that does reduce the amount of main RAM bandwidth needed to load models (perhaps drastically), it does not change the fact that only around there's only 22 GB/s of bandwidth to external RAM for all of XB360.

Jawed said:
I fully expect Cell/RSX to work the same way, for what it's worth. Though I'm not aware of confirmation, as yet, that vertex/texture data from Cell can be sent to RSX without going via RAM (either XDR or GDDR3). But the 20GB/s bandwidth from Cell to RSX sounds ideal.
I'm sure that it can (though I don't recall explicit confirmation), and it's 35 GB/s between Cell and RSX -[edit]- though you may be referring to just write bandwidth, which would actually make more sense in this context. Just forget the last sentence :oops:
 
PeterT said:
Sure, but while that does reduce the amount of main RAM bandwidth needed to load models (perhaps drastically), it does not change the fact that only around there's only 22 GB/s of bandwidth to external RAM for all of XB360.
What about the 32GB/s to the EDRAM? Which is prolly equivalent to around 64GB/s (at least...) because the ENTIRE ROP workload is removed from external RAM.

ARGH.

Jawed
 
Jawed said:
What about the 32GB/s to the EDRAM? Which is prolly equivalent to around 64GB/s (at least...) because the ENTIRE ROP workload is removed from external RAM.
Sure -- though I'd contest your equivalencies -- that's why I said external RAM bandwidth.
(If you count EDRAM as external you could make a case to count local store as well, which would lead us firmly into the realm of "big numbers")
 
PeterT said:
Sure -- though I'd contest your equivalencies -- that's why I said external RAM bandwidth.
(If you count EDRAM as external you could make a case to count local store as well, which would lead us firmly into the realm of "big numbers")
Why would you dismiss ROP bandwidth? The single most demanding bandwidth function in gaming graphics? Obviating almost the entire framebuffer bandwidth from the 22.4GB/s capacity of the main RAM.

You might just as well say that I can't count the 10.8GB/s CPU-GPU connection, which saves 21.6GB/s of RAM bandwidth.

Incredible.

Count LS all you like...

Jawed
 
Please actually read what I write. I never "dismissed ROP bandwidth". I was originally talking about external memory bandwidth, then you brought up first procedural synthesis, then EDRAM. I agreed that both reduce the amount of external RAM bandwidth required.

I also attested that (again, while reducing the need for it) neither change the fact that there are only 22 GB/s of external bw available to X360. Where's the problem?
 
Peter said:

"Sure, but while that does reduce the amount of main RAM bandwidth needed to load models (perhaps drastically), it does not change the fact that only around there's only 22 GB/s of bandwidth to external RAM for all of XB360."

I think its the use of 'ONLY' in this statement is whats driving the debate here. MY perception of this statement is that youre saying the 22GB/s bandwidth is inadequate. I think Jawed's responses are addressing that connotation. If you didnt mean it that way, then maybe clarifying your stance would clear it all up.

Thats just the way I took it, i could be wrong.

J
 
Last edited by a moderator:
Earlier you said:

I believe that -- if the developers mange to use the "6 thread" CPU to its full potential -- the unified memory, or rather its bandwidth will be one of the more bothersome bottlenecks on X360.

I've been pointing out why the 22.4GB/s of RAM bandwidth in XB360 isn't the constraint it initially appears to be, because CPU-GPU data (Xbox Procedural Synthesis) doesn't consume that bandwidth, and because ROP operations don't consume that bandwidth.

I'm simply removing two big constraints from the apparent "22.4GB/s bandwidth to memory".

At best the CPU can use 21.6 GB/s of that bandwidth - that's what the 6 threads are stuck with. But for the CPU to use that much means that XPS is not being used at all. That's one silly extreme (though perhaps with first gen games fairly likely).

If XPS is running full-tilt at 10.8GB/s, then the CPU can only be consuming 10.8GB/s of read bandwidth from RAM, leaving 11.6GB/s for texturing and MEMEXPORT (e.g. tessellation) type operations by Xenos. That's just another silly extreme.

Obviously it's going to be somewhere in the middle and there are overheads in concurrent reads/writes against RAM to eat into effective bandwidth.

I think it's fair to say that the main RAM bandwidth is the most obvious place for a bottleneck in XB360 - simply because across XB360 bottlenecks are so hard to find.

What I don't think is fair, is to say "22.4GB/s - that's not enough" without explaining the context of that bandwidth.

Jawed
 
expletive said:
I think its the use of 'ONLY' in this statement is whats driving the debate here.
I think you may be right. So let me reiterate: Because of the EDRAM the 22 GB/s external RAM bw of X360 are not comparable to the 48 GB/s external RAM bw of PS3, or the ~40-50 GB/s on a fast PC for that matter. Still, I do believe that the "only" qualifier is not totally unjustified, as 22.4 GB/s is less than what is available to Cell via XDR alone (25.6 GB/s).

Also, while the EDRAM argument is entirely justified, procedural synthesis can be done on PS3 at least as well as X360 -- and due to the higher write bandwidth from Cell to RSX, one could even argue the bandwidth "alleviation" factor to be greater. I couldn't think of many tasks the SPEs should be more suitable for than procedural generation of geometry from small datasets. Of course, this assumes that cell vertex data must not go through RSX RAM, which would be very disappointing. (And is quite unlikely IMHO)

[edit]
Jawed, you replied while I was writing the above, so I'd just like to add that I agree with most of what you said in your last message.
 
Last edited by a moderator:
I agree that Cell SPEs would be perfect for procedural synthesis (geometry and/or textures - e.g. simulating particle effects and rendering them out to a small set of textured polygons). And prolly for tessellation, too.

And the 20GB/s from Cell to RSX matches up with the XPS bandwidth in XB360 (which supposedly can do 2:1 on the fly compression - though I suspect that's only for vertex data).

It seems to me the fly in PS3's ointment is that ROP tasks will consume so much bandwidth in the GDDR3 VRAM, that either AA will be too costly or texturing from VRAM will be badly bottlenecked, perhaps causing significant amounts of static texturing to run from XDR.

That's the easy-to-spot bottleneck in PS3. A GPU that's faster than 7800GTX running with memory that's just over half as fast. If Cell takes up the slack to compensate, how much of the 25.6GB/s are you left with?

Jawed
 
What happens when either of these 2 consoles wants to use more than 256MB for texture data? Whats the impact on bandwidth for each?

J
 
blakjedi said:
Why can the 3D core read more than any other device can transfer to it (33.2 GB/s) ? 22.4 GB/s= max read from Memory + 10.8GB/s = max read from CPU = 33.2 GB/s.


So the GPU can read directly from CPU and RAM simultaneously at 33.2 GB/s... no bottleneck there, but then the B/W to RAM is completely saturated.. new bottleneck. If the Ram had a read B/W of 44.8GB/s then there would be no bottlenecks in the system the way its currently designed...
I didn't notice a good answer to this question yet so I'll try to explain. This apparent unbalancing of memory bandwith is because the bandwidth needs change over a frame and between games. Sometimes you need mostly texture bandwith, sometimes mostly vertex bandwidth. By allowing these blocks to accept more data than they would receive in balanced usage you are able to maximize memory bandwith.

Consider a frame that needs 16 GB/s of texture bandwith, 4 GB/s or vertex fetch bandwith, and 2 GB/s of other stuff. Assuming 22 GB/s of total bandwith this maxes out the memory bus. If everything was balanced to add up to 22 GB/s the texture cache wouldn't be able to accept all the data it requires and it would be the bottleneck despite having main memory bandwith to spare.

Hopefully I made some sense. It's always hard to tell. :D
 
Jawed said:
Count LS all you like...

Frankly if he counts LS, then you should be able to count L1 and L2 caches on the xenon cores. :devilish: ;) I kid I kid.
 
Last edited by a moderator:
Seeing is believing

Based on the real-time rendering on the PS3 I noticed some unimpressive modeling and texturing. The modeling is probably a result of the preliminary nature of the dev-kit but the texturing? X360 modeling looks even worse.

Low polygons no surprise since x360 Quake4 polycount is inferior to PC version, and most geometry in most games probably done on CPU, but animation & particles, bitmap explosions among others look bad. OTOH the texturing looks better than the PS3 games I have seen. Bandwidth the cause? Maybe I have not seen enough PS3 games.

PS3s 4:1 universal compression, z and occlusion culling should theoretically give 10x effective bandwidth but how much in practice? By my calc, 1080p with 4xAA requires 180GBs, real or effective bandwidth. Any thoughts?
 
1080p & 4XAA

london-boy said:
Who's ever going to use 1080p with 4xAA next generation?!

PS3 has theoretically 10x effective bandwidth due to compression and culling, so 480GB/s theoretical bandwith no? Theoretically enough for 2X1080P with 4XAA but in practice will it be enough for even 1080P with 4xAA? Big question.
 
ihamoitc2005 said:
PS3 has theoretically 10x effective bandwidth due to compression and culling, so 480GB/s theoretical bandwith no? Theoretically enough for 2X1080P with 4XAA but in practice will it be enough for even 1080P with 4xAA? Big question.

I'm not sure where you got those figures, but it's way over the top. Most we'll see this next gen is 720p 4xAA. Even that wont' be exactly cheap on RSX bandwidth.
 
london-boy said:
I'm not sure where you got those figures, but it's way over the top. Most we'll see this next gen is 720p 4xAA. Even that wont' be exactly cheap on RSX bandwidth.

RSX net BW=48GB/s

Based on NVIDIA claims, compression & culling give 10x effective theoretical BW, so 480GB/s, but even NVIDIA says real world boost may be half of theoretical. So maybe 240GB/s. But could even that optimistic?

As for whether these numbers are crazy, keep in mind 7800GTX outputs to higher resolution than 1080P with 4XAA. Pick resolution, figure out required bandwidth, divide byreal bandwidth, then you have effective bandwidth boost factor.
 
ihamoitc2005 said:
RSX net BW=48GB/s

Based on NVIDIA claims, compression & culling give 10x effective theoretical BW, so 480GB/s, but even NVIDIA says real world boost may be half of theoretical. So maybe 240GB/s. But could even that optimistic?

As for whether these numbers are crazy, keep in mind 7800GTX outputs to higher resolution than 1080P with 4XAA. Pick resolution, figure out required bandwidth, divide byreal bandwidth, then you have effective bandwidth boost factor.

And i'm telling you you're being overly optimistic. 240GB/s Effective bandwidth is being overly optimistic. 480GB/s is crazy.

To explain myself, you said that a 1080p+4xAA image takes up 180GB/s. That's way too much. Compression kicks in before that stage and the image doesn't take up that humongous amount of bandwidth. Still too much though for next gen consoles.

Developers will want to use badnwidth to send graphics data back and forth (textures mainly), so they will never use that kind of resolution with AA, they much rather send more textures but display at a lower resolution
 
london-boy said:
And i'm telling you you're being overly optimistic. 240GB/s Effective bandwidth is being overly optimistic. 480GB/s is crazy.

Those numbers are effective bandwidth not actual bandwidth based on NVIDIAs claims regarding BW savings. 10:1 BW savings = 10X effective BW increase

To explain myself, you said that a 1080p+4xAA image takes up 180GB/s. That's way too much. Compression kicks in before that stage and the image doesn't take up that humongous amount of bandwidth. Still too much though for next gen consoles.

180GB/s is before compression. Compression and culling reduce this as you said, therefore increasing effective bandwidth.

If its hard to understand as 10X effective bandwidth increase, think of it as 10:1 reduction in bandwidth requirement. So if 1080P with 4XAA at 60fps = 180GB/s, after NVDIAs claimed BW savings, theoretical effective BW requirement is 1/10th = 180/10 = 18GB/s

If optimal results achieved through NVIDIA's culling compression, etc, then 128bit to 700mhz GDDR3 is enough BW, BUT, unlikely 18GB/s achievable no? Question is whats real world decrease in BW requirement via LMA tricks like culling, compression, etc.

Developers will want to use badnwidth to send graphics data back and forth (textures mainly), so they will never use that kind of resolution with AA, they much rather send more textures but display at a lower resolution

Sony is advertising 2X1080P but looking at BW seems like no 4XAA possible at those resolutions.
 
Back
Top