GPU questions

1. Does an on-die Memory Controller make more cache less necessary? Or do I(ntegrated)MCs serve a completely different purpose?

2. is it more expensive to emulate an RGBA16FP color buffer or an FP32 depth buffer and shadows through shaders?

Wouldn't it actually be cheaper to use the shaders as depth test units, so that PowerVR-style TBDR could be used? Are there disadvantages to doing that? If so, what are the disadvantages?

3. What percentage of shader resources does ATi's edge detect AA take up on average, as a guestimate?
 
1. Does an on-die Memory Controller make more cache less necessary?
It depends on what the IMC is meant to do? Serve a lot of small requests at low latency (with cpu's) or read LOTS of data at high throughput (with gpu's)?

In cpu's, an imc should lower latency, acting somewhat like a larger cache. GPU caches are positively tiny, and they have imc's any way.

2. is it more expensive to emulate an RGBA16FP color buffer or an FP32 depth buffer and shadows through shaders?
Naively done, I should think so. Modern gpu's are highly tuned to do this sort of stuff, so you may not be able to beat ff hw there with shaders. Also, what's the upside to using a shader there?

Wouldn't it actually be cheaper to use the shaders as depth test units, so that PowerVR-style TBDR could be used?
That would depend on how the rest of the pipeline (sw or hw) is architected. I imagine you'd need to build the rest of the stuff around this idea to get reasonable efficiency (ie atleast some kind of spatial sorting). Naively emulating rops with shaders on present nv/amd hw would likely be expensive.

LRB 1 used this approach (not that it was naively done), though we may never know why it couldn't compete.

PS: a question of my own, are the z/color/stencil caches unified with the rest of the cache hierarchy on fermi?
 
IIRC Nvidia said, they've removed all caches and FIFOs and let Fermis cache hierarchy do the job. Don't know for sure though!
 
IIRC Nvidia said, they've removed all caches and FIFOs and let Fermis cache hierarchy do the job. Don't know for sure though!
Doubtful. FIFOs are much more area efficient than caches and in many places they are quite useful. I have a feeling your memory is forgetting some context.
 
1. Does an on-die Memory Controller make more cache less necessary? Or do I(ntegrated)MCs serve a completely different purpose?
The effective latency of on-board memory is influenced by the amount of latency a trip to memory will take. More cache tends to reduce the number of trips, so more of this latency is hidden. An on-die memory controller shaves off a fixed amount of latency. To get the same amount of apparent latency, less cache would be needed since the cost of a trip to memory would be lower.
While lower latency is helpful, GPUs by design favor high bandwidth to the point that their memory systems under load are probably much higher latency than they would be compared to a CPU IMC. The small caches on a GPU do more for increasing the effective bandwidth and are not large enough to make latency reduction a primary goal.

Bandwidth is relatively agnostic to where the controller is, though some power savings can be had by removing the middle man.

More pressing than latency is a question of what a GPU has to gain by moving the controller off-die. There are fewer benefits, such as low demand for expandability or support for multiple radically different memory types, and there is a higher sensitivity to the cost of complicating board layout and manufacturing a separate special-purpose ASIC.
 
Back
Top