Are PS3 devs using the two mem pools for textures?

Titanio · Jul 5, 2007

So they did double up on register counts vs G7x, or they're just splitting the registers that were there? I'm not sure if we have G7x per thread register counts or not..(?)

Heinrich4 · Jul 5, 2007

Jawed said:
That slide is RSX specific. G7x has half these capabilities.

Jawed

Because a G70 has half cache(and registers)?

nAo · Jul 5, 2007

Texture cache and registers mem are different things..and LOL at that slide

Crossbar · Jul 5, 2007

inefficient said:
SMT/Hyper threading: 2 threads, 2 sets of registers, 1 set of ALUs.

This solution: 2 threads, 1 sets of registers divided in half, 1 set of ALUs.

The trick quite simply seems to be to double buffer reads from memory to hide latency. It appears to hide latency very well, but at the cost off registers. But I assume the consequence is that you should stick to to simpler shaders when texturing from XDR.

This must also mean the RSX has it's own MMU and can DMA memory directly from XDR to itself. Or maybe I am reading too much into this. But imagine if the RSX MMU could DMA data directly from the SPU Local Stores!

Is it limited to just two threads? Or can you have more threads that depend on swapping of register content?

This post by Barbarian was kind of ambigous:

Barbarian said:
As far as I know the latency can be hidden by the hardware threading, but the pixel shaders have to use half the registers (compared to shaders when texturing from vram) otherwise there will be a performance degradation. It seems registers are shared between all hardware threads and more threads are needed to hide longer latencies, hence less registers available per thread.

Jawed · Jul 5, 2007

Titanio said:
So they did double up on register counts vs G7x, or they're just splitting the registers that were there? I'm not sure if we have G7x per thread register counts or not..(?)

Logically, it's possible that the register file in RSX is the same size as in G7x - the difference being that RSX has a shorter ALU+TMU+ALU pipe (half) - a lot of the pipe is a "do nothing" queue.

The shorter pipe would mean that the nominal latency hiding between instructions for a given pixel is halved. Because RSX/GDDR3 are "fixed" in PS3 it is possible that the worst-case latency of GDDR3, from the point of view of RSX, is short enough (~100 cycles) that bilinear texturing latency is guaranteed to be "1 clock".

That's just a theory, but it would make RSX smaller

XDR latency is ~ 400 cycles for Cell. At 500MHz (RSX clock) that's ~63 cycles. GDDR3 is roughly twice as far (best case) from Cell as XDR. Which is about 126 RSX cycles. Since Cell<->GDDR3 is quite contorted (FlexIO/PCI-Express/RSX-MC/GDDR3) it seems possible that GDDR3 is about 100 cycles or so of latency. But, well ...

It's all a bit of a dead end, this speculation. I think the safest observation is that NVidia made register usage easier for devs to deal with (relaxing the constraints seen by PC programmers) and made RSX integrate into PS3 better than merely "dumping" G7x in there. Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.

Jawed

Jesus2006 · Jul 5, 2007

Jawed said:
Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.

Jawed

Wasn't that just the case in some cases (non-SPU-DMA or something)?

almighty · Jul 5, 2007

Jawed said:
Though they did screw up when it comes to Cell reading GDDR3, because the bandwidth there is appalling, ~16MB/s.

I thought it was alot bigger then that? Was'nt it a few GB/S? :???:

inefficient · Jul 5, 2007

Dejavu'

almighty said:
I thought it was alot bigger then that? Was'nt it a few GB/S?

Love_In_Rio · Jul 5, 2007

Jawed said:
the difference being that RSX has a shorter ALU+TMU+ALU pipe (half) - a lot of the pipe is a "do nothing" queue.

huh ? did i miss something ?

betan · Jul 5, 2007

inefficient said:

The question should be "Are PS3 devs using cell to read from vram for anything at all"?

And I wonder where are the things regarding cell based postprocessing effects.

Shifty Geezer · Jul 5, 2007

This was covered when that story first broke, and the devs were saying there's no point in reading from GDDR so the miserable BW is irrelevant. That said, it's a disappointment for Linux. There's a hack I think that enables this memory to be accessed, but at 16 MB/s it's no faster than HDD or flash ram!

For postprocessing effects on Cell, or any other image effects, you'd transfer the buffer to XDR, do your thing, and pass it back. On some occasions you might get away with transferring data directly to Cell, doing something with it, and sending it back to the GPU. But I don't think particulars on that bidirectional interface have been made clear, or how it'd be used that way.

Love_In_Rio · Jul 5, 2007

betan said:
The question should be "Are PS3 devs using cell to read from vram for anything at all"?

And I wonder where are the things regarding cell based postprocessing effects.

RSX can copy needed data to XDR and from there Cell can read it fast enough.

betan · Jul 5, 2007

Shifty Geezer said:
This was covered when that story first broke, and the devs were saying there's no point in reading from GDDR so the miserable BW is irrelevant. That said, it's a disappointment for Linux. There's a hack I think that enables this memory to be accessed, but at 16 MB/s it's no faster than HDD or flash ram!

Why would you want to read from GDDR in Linux when most of RSX functionality is inaccessible? Are you talking about using GDDR as general purpose main memory, or software 3d APIs such as Mesa Gl?

For postprocessing effects on Cell, or any other image effects, you'd transfer the buffer to XDR, do your thing, and pass it back. On some occasions you might get away with transferring data directly to Cell, doing something with it, and sending it back to the GPU.

Now that looks like a case where you may want to have a nice bandwidth for Cell reading from vram.

nAo · Jul 5, 2007

betan said:
Now that looks like a case where you may want to have a nice bandwidth for Cell reading from vram.

No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any sense

Shifty Geezer · Jul 5, 2007

betan said:
Are you talking about using GDDR as general purpose main memory

Yes. It would have been nice to have 400+ MBs RAM for Linux, rather than 190 and 200 MBs sitting there in the machine not being used!

betan · Jul 5, 2007

nAo said:
No, cause the GPU has to read back the data anyway to use them, so why would you move them to vram (sucking bandwidth that RSX is using for other rendering) when you can just read them where they are? It doesn't make any sense

Are you questioning the use of SPUs for postprocessing? Because it means reading data from somewhere else and writing back.

By the way, regarding the bandwidth for 720p full scene post processing at 30 fps, isn't it around half a gigabyte/sec?

Shifty Geezer said:
Yes. It would have been nice to have 400+ MBs RAM for Linux, rather than 190 and 200 MBs sitting there in the machine not being used!

Indeed, but I don't think Linux kernel currently unifies split memory (vram and real main memory) anyway. I do agree it would be nice as many linux users don't care much about 3d (most of the time) and ~500 mega ps3 linux would be significantly more usable especially with mainstream desktop environments.

nAo · Jul 5, 2007

betan said:
Are you questioning the use of SPUs for postprocessing? Because it means reading data from somewhere else and writing back.

You don't understand.
You don't need to read data from video mem with the PPU or SPUs as you can move them to main mem using RSX and read them from there, and since wherever you write your output (main mem or video mem) you will need to read back it again with the GPU to be used/displayed/whatever who cares about moving data back to vram with CELL? not only it does matter, it just does not make sense.

By the way, regarding the bandwidth for 720p full scene post processing at 30 fps, isn't it around half a gigabyte/sec?

How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.

betan · Jul 6, 2007

nAo said:
You don't understand.
You don't need to read data from video mem with the PPU or SPUs as you can move them to main mem using RSX and read them from there, and since wherever you write your output (main mem or video mem) you will need to read back it again with the GPU to be used/displayed/whatever who cares about moving data back to vram with CELL? not only it does matter, it just does not make sense.

We were talking about a case where you may not want to keep a duplicate copy in main memory purely for memory efficiency. Do you think it doesn't make sense to minimize memory usage?
There are many filters you can apply to streaming data.

How can you tell in advance if you don't know how complex the post processing is/how many passes requires? it's not going to be a lot of bw anyway.

That figure was worst case scenario for keeping and updating full frame copy in the main memory, as I thought you were dismissing SPU postprocessing all together.

nAo · Jul 6, 2007

betan said:
We were talking about a case where you may not want to keep a duplicate copy in main memory purely for memory efficiency. Do you think it doesn't make sense to minimize memory usage?

If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every day

betan · Jul 6, 2007

nAo said:
If you want to move post processing effects to SPUs it mean you're doing it for speed, not for memory efficiency, trading off speed for memory or vice versa is the most common trade off we face every day

While I agree with the memory/speed tradeoff in general (i.e. not only for latency hiding et al.), that doesn't change the fact that CPU<-VRAM (read) bandwidth may be valuable, as that was the discussion.

It is not like "Doesn't make sense or you don't need it because you have to have a trade off".

Are PS3 devs using the two mem pools for textures?

Titanio

Heinrich4

nAo

Nutella Nutellae

Crossbar

Jawed

Jesus2006

almighty

inefficient

Love_In_Rio

betan

Shifty Geezer

uber-Troll!

Love_In_Rio

betan

nAo

Nutella Nutellae

Shifty Geezer

uber-Troll!

betan

nAo

Nutella Nutellae

betan

nAo

Nutella Nutellae

betan

Similar threads