That slide is RSX specific. G7x has half these capabilities.
Jawed
SMT/Hyper threading: 2 threads, 2 sets of registers, 1 set of ALUs.
This solution: 2 threads, 1 sets of registers divided in half, 1 set of ALUs.
The trick quite simply seems to be to double buffer reads from memory to hide latency. It appears to hide latency very well, but at the cost off registers. But I assume the consequence is that you should stick to to simpler shaders when texturing from XDR.
This must also mean the RSX has it's own MMU and can DMA memory directly from XDR to itself. Or maybe I am reading too much into this. But imagine if the RSX MMU could DMA data directly from the SPU Local Stores!
As far as I know the latency can be hidden by the hardware threading, but the pixel shaders have to use half the registers (compared to shaders when texturing from vram) otherwise there will be a performance degradation. It seems registers are shared between all hardware threads and more threads are needed to hide longer latencies, hence less registers available per thread.
Logically, it's possible that the register file in RSX is the same size as in G7x - the difference being that RSX has a shorter ALU+TMU+ALU pipe (half) - a lot of the pipe is a "do nothing" queue.So they did double up on register counts vs G7x, or they're just splitting the registers that were there? I'm not sure if we have G7x per thread register counts or not..(?)
Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.
Jawed
Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.
I thought it was alot bigger then that? Was'nt it a few GB/S?
the difference being that RSX has a shorter ALU+TMU+ALU pipe (half) - a lot of the pipe is a "do nothing" queue.
The question should be "Are PS3 devs using cell to read from vram for anything at all"?
And I wonder where are the things regarding cell based postprocessing effects.
This was covered when that story first broke, and the devs were saying there's no point in reading from GDDR so the miserable BW is irrelevant. That said, it's a disappointment for Linux. There's a hack I think that enables this memory to be accessed, but at 16 MB/s it's no faster than HDD or flash ram!
Now that looks like a case where you may want to have a nice bandwidth for Cell reading from vram.For postprocessing effects on Cell, or any other image effects, you'd transfer the buffer to XDR, do your thing, and pass it back. On some occasions you might get away with transferring data directly to Cell, doing something with it, and sending it back to the GPU.
No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any senseNow that looks like a case where you may want to have a nice bandwidth for Cell reading from vram.
Yes. It would have been nice to have 400+ MBs RAM for Linux, rather than 190 and 200 MBs sitting there in the machine not being used!Are you talking about using GDDR as general purpose main memory
No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any sense
Yes. It would have been nice to have 400+ MBs RAM for Linux, rather than 190 and 200 MBs sitting there in the machine not being used!
You don't understand.Are you questioning the use of SPUs for postprocessing? Because it means reading data from somewhere else and writing back.
How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.By the way, regarding the bandwidth for 720p full scene post processing at 30 fps, isn't it around half a gigabyte/sec?
You don't understand.
You don't need to read data from video mem with the PPU or SPUs as you can move them to main mem using RSX and read them from there, and since wherever you write your output (main mem or video mem) you will need to read back it again with the GPU to be used/displayed/whatever who cares about moving data back to vram with CELL? not only it does matter, it just does not make sense.
How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.
If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every dayWe were talking about a case where you may not want to keep a duplicate copy in main memory purely for memory efficiency. Do you think it doesn't make sense to minimize memory usage?
If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every day