Are PS3 devs using the two mem pools for textures?

If you are a good GPUs architect you will design a GPU that will cope with your whole system latency, if you don't..well..they should fire you! ;)
Ahhh, but what if you're a good GPU architect who's designed a GPU for a fast local memory bus, and then your GPU is shoe-horned into a split memory pool system it wasn't designed for? RSX is mostly regarded as a G70 'thing' put in PS3, without any real design work done, and I suppose most people think of it in such terms - it's ability to hide latency must be similar to 7800s which is based on on fast local VRAM and no reaching across to find XDR.
 
RSX is dramatically "better" at hiding latency than G7x, for what it's worth. It's as tolerant of PS3 system RAM as G7x is tolerant of GDDR.

Jawed
 
Ah, I am beginning to see it in a different light now. Thank you for putting up with me. :oops:
 
b3da0.jpg


Jawed

Ah, thanks..

I remember this being explained to me some time ago, but I'd forgotten all about it. Thanks again..:)
 
Ahhh, but what if you're a good GPU architect who's designed a GPU for a fast local memory bus, and then your GPU is shoe-horned into a split memory pool system it wasn't designed for? RSX is mostly regarded as a G70 'thing' put in PS3, without any real design work done, and I suppose most people think of it in such terms - it's ability to hide latency must be similar to 7800s which is based on on fast local VRAM and no reaching across to find XDR.
IF RSX simply were a 7800 you'd still to heavily redesign parts of it, I mean..PCIE and FLEXIO are not very similar, so it would have needed real design work to be done anyway.
 
is that specifically for RSX?

From what I've heard, I believe it is..

Basically you can increase your tolerance to latency - be it from GDDR or XDR - by increasing the number of threads, at the expense of per-thread resources.

I guess given the extra constraints on thread resources, the programmer has control over this. If you know you'll only be texturing from GDDR you can use this number of registers per thread. If you're going to be using XDR, you can use this lower number, and let the GPU increase the number of threads in flight such that if a thread is stalled waiting on its texture data, it's more likely there'll be another ready thread to switch to while the other waits.

I wonder if there'd be value in having more register-light(er) threads even if just texturing from GDDR.

It's not really about the latency not being as bad as thought..that latency is still there, but there are tools to mitigate its impact as above and keep the GPU busy.
 
From what I've heard, I believe it is..

Basically you can increase your tolerance to latency - be it from GDDR or XDR - by increasing the number of threads, at the expense of per-thread resources.

I guess given the extra constraints on thread resources, the programmer has control over this. If you know you'll only be texturing from GDDR you can use this number of registers per thread. If you're going to be using XDR, you can use this lower number, and let the GPU increase the number of threads in flight such that if a thread is stalled waiting on its texture data, it's more likely there'll be another ready thread to switch to while the other waits.
So it is a kind of hyper-thread solution where the ALU-resources are shared but not the registers?

Jawed said:
That slide is RSX specific. G7x has half these capabilities.
How much impact on the transistor count could such a extension imply with regard to the shader implementation?
 
So it is a kind of hyper-thread solution where the ALU-resources are shared but not the registers?

SMT/Hyper threading: 2 threads, 2 sets of registers, 1 set of ALUs.

This solution: 2 threads, 1 sets of registers divided in half, 1 set of ALUs.

The trick quite simply seems to be to double buffer reads from memory to hide latency. It appears to hide latency very well, but at the cost off registers. But I assume the consequence is that you should stick to to simpler shaders when texturing from XDR.

This must also mean the RSX has it's own MMU and can DMA memory directly from XDR to itself. Or maybe I am reading too much into this. But imagine if the RSX MMU could DMA data directly from the SPU Local Stores!
 
Last edited by a moderator:
RSX is dramatically "better" at hiding latency than G7x, for what it's worth. It's as tolerant of PS3 system RAM as G7x is tolerant of GDDR.

Jawed


And then RSX is twice better than G7x hiding latency from GDDR3 ? Anyway the disparity in latency tolerance between the two pools forces whether not to use complex shaders at all or to lose several frames when using them and both pools for texturing, no ?, as it must be a headache to manage which pool to use depending of the shader intensity.

P.D: prize for the slide of the month! do you have more ??? ;)
 
Last edited by a moderator:
The trick quite simply seems to be to double buffer reads from memory to hide latency. It appears to hide latency very well, but at the cost off registers. But I assume the consequence is that you should stick to to simpler shaders when texturing from XDR.

I think we should more specifically say 'lower-register-using shaders' ;) Complexity is a more..complicated thing than the number of registers you're using. I've read papers where the author stepped through the process of optimising out registers in their shaders, and I can assure you, they weren't simplifying anything ;) Of course, if the compiler does not do this adequately, automatically, it is more work for the developer to make manual optimisations like this. And of course, some computation can have register usage more easily reduced without affecting the final result than others.
 
And then RSX is twice better than G7x hiding latency from GDDR3 ?
It's solely about the complexity of a shader: how many registers need to be allocated in the register file. RSX is kinder on PS3 devs in this respect than G7x is. As shaders get longer they tend to use more registers. But a good compiler (or programmer writing low-level code) can "re-use" registers.

As that slide says, using "half" registers whenever possible is also a good strategy.

Anyway the disparity in latency tolerance between the two pools forces whether not to use complex shaders at all or to lose several frames when using them and both pools for texturing, no ?, as it must be a headache to manage which pool to use depending of the shader intensity.
That's why game devs are so highly paid and respected... At the same time, imagine the freedom they have knowing they're not programming a PC.

Jawed
 
How much impact on the transistor count could such a extension imply with regard to the shader implementation?
As a minimum it's going to double the register file size. Nothing monstrous as far as RSX, overall, is concerned, less than 5% extra die, or less than 10% extra transistors, I suspect.

Jawed
 
That's why game devs are so highly paid and respected... At the same time, imagine the freedom they have knowing they're not programming a PC.

Jawed

Highly paid??

What devs do YOU know??

Let me know where they work and i'll drop my CV at the reception.. :D
 
Back
Top