X360 vs PS3 GPU power/ALU's etc

3dcgi said:
Any idea where you read about this? I'm interested in seeing how King Kong is using tessellation.
It was in a quote in the Console Games section. A search may pull it up. i think it may have been IGN, but I am not sure. Basically it was discussing all the extra stuff they were tossing in the Xbox 360 version.
 
XenonL2-Xenos com-link

3dcgi said:
I think Jawed is talking about the case where the CPU generates vertices and stores them in the L2 cache where Xenos can read them directly. This bypasses main memory saving the write bandwidth on the FSB.

What is bandwidth for Xenos to L2 cache?
 
Acert93 said:
It was in a quote in the Console Games section. A search may pull it up. i think it may have been IGN, but I am not sure. Basically it was discussing all the extra stuff they were tossing in the Xbox 360 version.
I read another source, and the wording was different enough to make it sound like Ubi was running tesselation as an offline process and saving the results for the X360 version. So I'm not sure Kong is using the dynamic version. The weak link here are these game news reporters that don't understand the difference between a bump map and a specular map. So getting tech info from them is near useless.

I'm sure history will bear me out on this one.
 
ihamoitc2005 said:
What is bandwidth for Xenos to L2 cache?
Ack, I knew this! I've been educated but it's hazy. I think this BW is 10.8 GB/s, this being the peak saving Jawed mentions above. It's akin to the Cell > RSX Bandwidth and provides the possibility of sending data straight to the GPU without it going through RAM.

I'm not sure how well this would work though, on either system. If you have a large model, do all vertices need to be sent to the GPU at the same time? Do they not need to be stored somewhere for the GPU to access? as it needs to process the data at least twice? And likewise textures, if the GPU is texture fetching it needs to be synchronised with the CPU to receive data. I'd have thought that procedural textures would also need to be saved into RAM for access. So I don't know how these direct CPU<>GPU connections will actually be used in real situations.
 
Streaming

Shifty Geezer said:
Ack, I knew this! I've been educated but it's hazy. I think this BW is 10.8 GB/s, this being the peak saving Jawed mentions above. It's akin to the Cell > RSX Bandwidth and provides the possibility of sending data straight to the GPU without it going through RAM.

I'm not sure how well this would work though, on either system. If you have a large model, do all vertices need to be sent to the GPU at the same time? Do they not need to be stored somewhere for the GPU to access? as it needs to process the data at least twice? And likewise textures, if the GPU is texture fetching it needs to be synchronised with the CPU to receive data. I'd have thought that procedural textures would also need to be saved into RAM for access. So I don't know how these direct CPU<>GPU connections will actually be used in real situations.

This is explained well here:
http://arstechnica.com/articles/paedia/cpu/xbox360-1.ars/5

So it uses CPU-Memory write bandwidth and one decompression thread but rather than CPU generated geometry data routing through main memory as buffer, DMA is used for GPU to get "bite-size" chunks of data directly from locked portion of L2 cache that acts as mini-buffer if GPU processing of old data is slow and retrieval of new data is delayed. A lot of DMA no?
 
Cool, isn't it?

Combine that with adaptive tessellation and it's looking pretty groovy. The CPU no longer does geometry, at all. Everything geometry related (in graphics rendering) is done on Xenos.

Jawed
 
Cpu

Jawed said:
Cool, isn't it?

Combine that with adaptive tessellation and it's looking pretty groovy. The CPU no longer does geometry, at all. Everything geometry related (in graphics rendering) is done on Xenos.

Jawed

I think if CPU efficiency can be maintained where one thread and certain amount of L2 cache can be permanently locked for data creation for GPU geometry, then this will result in improved geometry performance in Xbox360 games. The need for this unique console-minded approach might explain why so many first generation XBox360 games seem to have limited geometry and rely excessively on normal mapping. I am sure 2nd generation engines will take advantage of this to have really impressive models. It is a good solution to the problem and efficient CPU thread management and developers having console mentality not PC mentality will be key to performance.
 
ihamoitc2005 said:
So it uses CPU-Memory write bandwidth and one decompression thread but rather than CPU generated geometry data routing through main memory as buffer, DMA is used for GPU to get "bite-size" chunks of data directly from locked portion of L2 cache that acts as mini-buffer
But, there's a z-pass followed by the rendering, right? If you feed the geometry into the z-pass phase, it'll still need to be stored for use in the rendering. I guess it cuts out half (or more) the BW needed compared with write to RAM and two reads from RAM, using one write to RAM and one read from RAM. Is this right?
 
I think that's roughly correct.

My understanding:
  1. CPU->GPU - basic "vertex stream" in the Xbox Procedural Synthesis protocol
  2. GPU->memory - vertex stream with tessellation manipulations (vertices created and deleted)
  3. memory->GPU->memory - z-pre-pass - transforms the scene into screen space and fills the z-buffer (or one of the 2 or 3 tiles of the z-buffer) whilst annotating the vertex stream with tile number(s) for each triangle (i.e. which tiles does each triangle appear in) - outputting into memory as it goes an updated vertex stream
  4. memory->GPU - colour pass(es) - one or more passes (depending on tiles) to render the final frame, using the vertex stream as input
  5. memory->GPU - special effects - shadow rendering, depth of field, motion blur, etc...
Of course the point here is that tessellation and z-pre-pass rendering techniques both chew memory bandwidth. But at this stage of rendering there is no texturing (which can chew 16GB/s), or very limited texturing, which should mean that Xenon and Xenos avoid a bandwidth-crunch.

Jawed
 
No RAM bandwidth used

Shifty Geezer said:
But, there's a z-pass followed by the rendering, right? If you feed the geometry into the z-pass phase, it'll still need to be stored for use in the rendering. I guess it cuts out half (or more) the BW needed compared with write to RAM and two reads from RAM, using one write to RAM and one read from RAM. Is this right?

Yes, this technique can = big savings in RAM access bandwidth.
 
Back
Top