Say you build a "global illumination" algorithm, you might use low refresh rates on the computations, or even stagger the computations over successive frames operating on different areas/LODs in a round robin fashion.
Or you have a particle system built upon an effects-physics simulation.
Or reprojection caching of pixels, using final shaded pixel results from prior frames.
Any kind of technique that makes stuff persist over multiple frames is AFR-unfriendly.
Jawed
Maybe that's what they'll do - maybe DP in GT200 is the way it is because it was bolted on, meaning the SP MAD lanes weren't touched, minimising the design effort and making for an interim solution.
Jawed
That I don't know - but I presume it's a question of transistor budget and how much you're going to have to add to each lane to make stitching work with the full (or better) feature set of the current DP unit.Would stitching lanes result in loss of features?
Would stitching lanes result in loss of features?
Anyone want to do a feature comparison?AFAIK, amd's dp is 3x faster but has less features compared to nv.
That makes sense. And honestly, if you take the mile high overview of things like me and then look at the kind of parallelism available in the shading stages, it's a shame that the best mGPU stuff being plain AFR. Surely, better techniques have to be available to achieve mGPU scaling, even if it means moving in a lrb like direction.
It seems like the only way forward would be some sort of memory sharing and cache coherent multi-chip architecture.
Voltage does make a large difference. In fact its where the majority of the idle power savings are coming from with HD 4890.When comparing against HD 4850 - no. It's about 40-50 watts higher under load and I do not want to attribute all that to the GPU alone, albeit it runs at a higher voltage.
Load efficiency and idle power savings/features are different things.You/AMD have shown that PP 2.0 can work really well in the case of HD4670. All I'm saying is, that I'd love to the an equivalently impressive implementation on HD 4770.
The problem is bandwidth ... cache coherency Larrabee style causes more problems than it solves when used externally (generating extra bandwidth is what snooping is good at).
The problem is bandwidth ... cache coherency Larrabee style causes more problems than it solves when used externally (generating extra bandwidth is what snooping is good at).
Does Larrabee even have the hooks to begin considering that approach? There's Quickpath and Hypertransport for CPUs but has there been any word on any sort of inter-chip link on LRB?
Unless, of course, we see some radical changes in the graphics pipeline in the future.