Console GPU afterthought: Hybrid fixed/unified shaders

ROG27

Regular
I've been thinking...cars have been doing it lately...so why not GPUs? Fixed-type ALU's give the performance benefits in exchange for flexibility and efficiency, while unified pipes tend to take a performance hit. Why not have a primarily fixed-focus gpu with powerful fixed-shaders (8 + 24 for instance) and then incorporate another 8 to 16 ALUs for doing unified shading to add flexibility and efficiency where necessary.

People's take? RSX possibility? Hollywood possibility?
 
ROG27 said:
while unified pipes tend to take a performance hit.
:?: Where do you get this from. IMHO the only hit unified shaders take is the associated scheduling logic adds to die space.
 
nelg said:
:?: Where do you get this from. IMHO the only hit unified shaders take is the associated scheduling logic adds to die space.


Not a hit so much as a cost/performance trade-off. Because of all that extra die space being used for scheduling etc. is it viable to create unified shaders with an output of more than 2 shader ops per cycle?
 
I'm unsure of how many programmable shader ops you can get per cycle in fixed vs unified shader gpu's...but 96 for unified and 132 for fixed comes to mind for the latest and greatest.
 
Scheduling has a relatively high fixed cost, and an average per-transistor cost, with the fixed cost depending a lot on the implementation (cf. PowerVR SGX, which manages to get an unified architecture with a real scheduler in a minimal number of transistors).

If anything, when you get at much bigger transistor budgets (1T+), I'd suspect unified architectures would begin to scale better than non-unified ones (which, for the LOVE OF GOD, shouldn't be called "fixed" - such a keyword is reserved for DX7-style fixed-function architectures, damnit!)


Uttar
 
ROG27 said:
Not a hit so much as a cost/performance trade-off. Because of all that extra die space being used for scheduling etc. is it viable to create unified shaders with an output of more than 2 shader ops per cycle?

So how much extra die space does scheduling etc take up? :rolleyes:
 
BRiT said:
So how much extra die space does scheduling etc take up? :rolleyes:

obviously enough trannies to eat into peformance vs. a standard non-unified architecture with fewer ALUs.
 
you can't do/have everything and dissipate enough heat to make the thing work with the current manufacturing process (90 nm) IMO. Perhaps this is why higher performing Unified solutions are not hitting shelves at lightspeed as of yet.
 
Uttar said:
Scheduling has a relatively high fixed cost, and an average per-transistor cost, with the fixed cost depending a lot on the implementation (cf. PowerVR SGX, which manages to get an unified architecture with a real scheduler in a minimal number of transistors).

If anything, when you get at much bigger transistor budgets (1T+), I'd suspect unified architectures would begin to scale better than non-unified ones (which, for the LOVE OF GOD, shouldn't be called "fixed" - such a keyword is reserved for DX7-style fixed-function architectures, damnit!)


Uttar

Exactly...so why wouldn't a hybrid GPU be a nice solution until we get there (1T+).
 
ROG27 said:
Exactly...so why wouldn't a hybrid GPU be a nice solution until we get there (1T+).
Reread what I said. Part of the scheduling costs are static, that means they don't scale with the number of pipelines you got for a given efficiency. As such, a hybrid solution would have higher transistor costs for a given performance than EITHER unified or non-unified solutions, at least IMO.

Uttar
 
Uttar said:
Reread what I said. Part of the scheduling costs are static, that means they don't scale with the number of pipelines you got for a given efficiency. As such, a hybrid solution would have higher transistor costs for a given performance than EITHER unified or non-unified solutions, at least IMO.

Uttar


Oh ok...I understand what you are saying...but I guess how costly the static part is would be the key to understanding if a hybrid would be worth it or not. If the variable transistor cost (per unified ALU added) is significantly higher than the fixed allocated cost, than perhaps a limited number of unified ALUs would be the better solution currently. But we know not (at least I don't) what these tranny costs for scheduling are.
 
Just a stupid thought, but why couldn't nV or ATI license the SGX design or some parts of it if that's really a better design (which I certainly don't know, but the idea is theoretically possible)?
 
Back
Top