Are PS3 devs using the two mem pools for textures?

So they did double up on register counts vs G7x, or they're just splitting the registers that were there? I'm not sure if we have G7x per thread register counts or not..(?)
 
Texture cache and registers mem are different things..and LOL at that slide ;)
 
SMT/Hyper threading: 2 threads, 2 sets of registers, 1 set of ALUs.

This solution: 2 threads, 1 sets of registers divided in half, 1 set of ALUs.

The trick quite simply seems to be to double buffer reads from memory to hide latency. It appears to hide latency very well, but at the cost off registers. But I assume the consequence is that you should stick to to simpler shaders when texturing from XDR.

This must also mean the RSX has it's own MMU and can DMA memory directly from XDR to itself. Or maybe I am reading too much into this. But imagine if the RSX MMU could DMA data directly from the SPU Local Stores!

Is it limited to just two threads? Or can you have more threads that depend on swapping of register content?

This post by Barbarian was kind of ambigous:
As far as I know the latency can be hidden by the hardware threading, but the pixel shaders have to use half the registers (compared to shaders when texturing from vram) otherwise there will be a performance degradation. It seems registers are shared between all hardware threads and more threads are needed to hide longer latencies, hence less registers available per thread.
 
So they did double up on register counts vs G7x, or they're just splitting the registers that were there? I'm not sure if we have G7x per thread register counts or not..(?)
Logically, it's possible that the register file in RSX is the same size as in G7x - the difference being that RSX has a shorter ALU+TMU+ALU pipe (half) - a lot of the pipe is a "do nothing" queue.

The shorter pipe would mean that the nominal latency hiding between instructions for a given pixel is halved. Because RSX/GDDR3 are "fixed" in PS3 it is possible that the worst-case latency of GDDR3, from the point of view of RSX, is short enough (~100 cycles) that bilinear texturing latency is guaranteed to be "1 clock".

That's just a theory, but it would make RSX smaller :LOL:

XDR latency is ~ 400 cycles for Cell. At 500MHz (RSX clock) that's ~63 cycles. GDDR3 is roughly twice as far (best case) from Cell as XDR. Which is about 126 RSX cycles. Since Cell<->GDDR3 is quite contorted (FlexIO/PCI-Express/RSX-MC/GDDR3) it seems possible that GDDR3 is about 100 cycles or so of latency. But, well ...

It's all a bit of a dead end, this speculation. I think the safest observation is that NVidia made register usage easier for devs to deal with (relaxing the constraints seen by PC programmers) and made RSX integrate into PS3 better than merely "dumping" G7x in there. Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.

Jawed
 
Dejavu'

:???: I thought it was alot bigger then that? Was'nt it a few GB/S? :???:

PS3_memory_bandwidths.jpg
 
This was covered when that story first broke, and the devs were saying there's no point in reading from GDDR so the miserable BW is irrelevant. That said, it's a disappointment for Linux. There's a hack I think that enables this memory to be accessed, but at 16 MB/s it's no faster than HDD or flash ram!

For postprocessing effects on Cell, or any other image effects, you'd transfer the buffer to XDR, do your thing, and pass it back. On some occasions you might get away with transferring data directly to Cell, doing something with it, and sending it back to the GPU. But I don't think particulars on that bidirectional interface have been made clear, or how it'd be used that way.
 
The question should be "Are PS3 devs using cell to read from vram for anything at all"?

And I wonder where are the things regarding cell based postprocessing effects.

RSX can copy needed data to XDR and from there Cell can read it fast enough.
 
This was covered when that story first broke, and the devs were saying there's no point in reading from GDDR so the miserable BW is irrelevant. That said, it's a disappointment for Linux. There's a hack I think that enables this memory to be accessed, but at 16 MB/s it's no faster than HDD or flash ram!

Why would you want to read from GDDR in Linux when most of RSX functionality is inaccessible? Are you talking about using GDDR as general purpose main memory, or software 3d APIs such as Mesa Gl?

For postprocessing effects on Cell, or any other image effects, you'd transfer the buffer to XDR, do your thing, and pass it back. On some occasions you might get away with transferring data directly to Cell, doing something with it, and sending it back to the GPU.
Now that looks like a case where you may want to have a nice bandwidth for Cell reading from vram. :)
 
Now that looks like a case where you may want to have a nice bandwidth for Cell reading from vram. :)
No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any sense
 
No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any sense

Are you questioning the use of SPUs for postprocessing? Because it means reading data from somewhere else and writing back.

By the way, regarding the bandwidth for 720p full scene post processing at 30 fps, isn't it around half a gigabyte/sec?

Yes. It would have been nice to have 400+ MBs RAM for Linux, rather than 190 and 200 MBs sitting there in the machine not being used!

Indeed, but I don't think Linux kernel currently unifies split memory (vram and real main memory) anyway. I do agree it would be nice as many linux users don't care much about 3d (most of the time) and ~500 mega ps3 linux would be significantly more usable especially with mainstream desktop environments.
 
Are you questioning the use of SPUs for postprocessing? Because it means reading data from somewhere else and writing back.
You don't understand.
You don't need to read data from video mem with the PPU or SPUs as you can move them to main mem using RSX and read them from there, and since wherever you write your output (main mem or video mem) you will need to read back it again with the GPU to be used/displayed/whatever who cares about moving data back to vram with CELL? not only it does matter, it just does not make sense.

By the way, regarding the bandwidth for 720p full scene post processing at 30 fps, isn't it around half a gigabyte/sec?
How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.
 
You don't understand.
You don't need to read data from video mem with the PPU or SPUs as you can move them to main mem using RSX and read them from there, and since wherever you write your output (main mem or video mem) you will need to read back it again with the GPU to be used/displayed/whatever who cares about moving data back to vram with CELL? not only it does matter, it just does not make sense.

We were talking about a case where you may not want to keep a duplicate copy in main memory purely for memory efficiency. Do you think it doesn't make sense to minimize memory usage?
There are many filters you can apply to streaming data.

How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.

That figure was worst case scenario for keeping and updating full frame copy in the main memory, as I thought you were dismissing SPU postprocessing all together.
 
We were talking about a case where you may not want to keep a duplicate copy in main memory purely for memory efficiency. Do you think it doesn't make sense to minimize memory usage?
If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every day
 
If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every day

While I agree with the memory/speed tradeoff in general (i.e. not only for latency hiding et al.), that doesn't change the fact that CPU<-VRAM (read) bandwidth may be valuable, as that was the discussion.

It is not like "Doesn't make sense or you don't need it because you have to have a trade off".
 
Back
Top