991060 said:
Xmas said:
I hope there is no such thing in WGF. Load Balancing should be entirely up to the hardware, there's no way software could do it better.
Even without the help of driver? I always thought load balancing is done by the driver, will GPU be that smart in a few years?
I don't think you'd
want to have the help of the driver. If the load-balancing were based upon a software setting, it could only realistically be changed on a per-pass basis. This may be helpful making, for example, screen-sized quads slightly more efficient, but it's nowhere near the potential benefits for having unified pipelines.
I think that the best way to have unified pipelines would be that you have one pipeline, or group of pipelines, work on a triangle from start to finish (well, wouldn't have to exactly be this, but something similar: a truly optimal situation would be one in which all pipelines share one queue which includes both vertices and pixels that need to be calculated). How would this work?
You'd have a dual-state pipeline that operated either in vertex or pixel mode. It simply switches modes depending upon the type of data that next comes through the pipe. One potential problem is that, for efficiency's sake, you'd probably want to have different latency characteristics between the two modes (i.e. data with lots of texture reads and few branches will do better with long latency, whereas data with few texture reads and more branches will do better with shorter latencies). Priority queues may solve this issue, but the question remains as to how many transistors such an implementation requires.
In the end, what does this buy you? Well, it basically means that when an object is far away, and the triangles are pixel-sized or smaller, you won't have pixel pipelines sitting idle. Conversely, you won't have vertex pipelines sitting idle when rendering large triangles.
There is also the potential benefit of requiring less cache. In order to keep vertex and pixel pipelines running as fast as they can, a modern architecture needs to have as much cache between the two as possible. This may no longer be necessary for a unified architecture, though it does have the benefit of also improving memory accesses for pixel data.
Lastly, why go unified? Unified pipelines are a benefit if, and only if, performance turns out to be higher, everything else remaining the same, for a unified architecture with the same number of transistors. Nobody can make this final decision but those making the chips themselves. And it may, in fact, depend upon the microarchitecture of specific products as to whether or not this will be the case.