Titanio said:
Agreed, I don't think it matters much anyway for these purposes, but I do wonder if it's so much about avoiding latency as saving bandwidth. That the kind of things this would be used for isn't latency sensitive only further suggests that.
I think it's entirely about saving bandwidth against DDR RAM. GPUs are latency-tolerant by design. The newest GPU designs take latency-tolerance to new extremes - hence the whole concept of out-of-order threading.
Why put stuff into main memory if it doesn't need to go there? You've just saved the 10.8GB/s needed to put it into memory, and the 10.8GB/s to read it back out again.
---
I'm sure we'll see something similar in PS3.
Additionally, within Cell, bandwidth is saved by having the LSs able to send/fetch data amongst themselves. None of those tasks touch memory, or even Cell cache. This is entirely dependent on the algorithm being streaming.
If your algorithm is streaming in nature (e.g. "here are the polys that make up a monster as he swings his hammer down to crush you") then there's a great opportunity to keep that data away from memory. In that sense RSX should appear like "another Cell" connected to PS3's Cell. At the very least RSX should consume procedural poly and texture data without that data having to go into either XDR RAM or GDDR3 RAM.
---
All of this is not to say that Xenos might run some shaders on the data coming direct from Xenon L2, and stuff the results of those shaders into memory. e.g. tessellation is a two-pass process with the implication that main memory is used to hold the intermediate data - but this topic is very sketchy. Tessellation (including adaptive tessellation for level of detail control) leads to some polys being deleted as well as others being created.
Then there's the memory consumed by vertex/poly data in performing predicated tiling (e.g. because the back-buffer needs to be split into 3 tiles).
So, even though the XBox Procedural Synthesis streaming pipeline starts with a portion of Xenon-L2-locked for the GPU with the data going straight into Xenos's shader pipelines - the requirement to loop through that data both to perform tessellation and tile-predication means that the (or at least some of the) vertex/poly data needs to be kept in memory.
Well, that's the way I interpret it. The detail is just not there, sadly.
Jawed