From this article we learned that Xenos can execute at least 2 different types of instructions at the same time such as an arithmetic instruction and a texture fetch instruction.
After all Xenos has a 'global' pool of compunational resources that can be shared and dynamically assigned across many threads.
What if these computational resources are diversified even more?
In this NVIDIA patent resources diversification is someway addressed:
In the beginning we had only one kind of computational resource, later we got (semi)indipedent ALUs and TMUs, and at some point in the future there would probably be even more diversification.
If the DOT3 operation is used ten times more frequently than a reciprocal operation why should we include a RCP unit in every ALU?
Modern CPUs already do this but a much lesser extent cause usually they don't run more than 1 or 2 threads at the same time.
To be fair unified shading is not needed to diversifcate computational resources, but it seems (at least logically) that further diversification makes more sense in a unified shading architecture (as that NVIDIA patent shows..)
What do you think?
NB:
f@nb0y1 -> my ATI GPU is better than yours, I've got a 2:8:2:5:12:6:4 GPU!
f@nb0y2 -> WTF!? stupid moron my NVIDIA GPU is faster! I've got a 2:6:4:3:16:2:1 GPU!
After all Xenos has a 'global' pool of compunational resources that can be shared and dynamically assigned across many threads.
What if these computational resources are diversified even more?
In this NVIDIA patent resources diversification is someway addressed:
In addition, selection may take into account the state of the execution module. In one such embodiment, execution module 142 contains specialized execution units (or execution pipes), with different operations being directed to different execution units; e.g., there may be an execution unit that performs floating-point arithmetic and another that performs integer arithmetic. If the execution unit needed by a ready instruction for one thread is busy, an instruction from a different thread may be selected. For instance, suppose that at a given time, the floating-point pipeline is busy and the integer pipeline is free. A thread with an integer-arithmetic instruction ready can be given priority over a thread with a floating-point instruction
In the beginning we had only one kind of computational resource, later we got (semi)indipedent ALUs and TMUs, and at some point in the future there would probably be even more diversification.
If the DOT3 operation is used ten times more frequently than a reciprocal operation why should we include a RCP unit in every ALU?
Modern CPUs already do this but a much lesser extent cause usually they don't run more than 1 or 2 threads at the same time.
To be fair unified shading is not needed to diversifcate computational resources, but it seems (at least logically) that further diversification makes more sense in a unified shading architecture (as that NVIDIA patent shows..)
What do you think?
NB:
f@nb0y1 -> my ATI GPU is better than yours, I've got a 2:8:2:5:12:6:4 GPU!
f@nb0y2 -> WTF!? stupid moron my NVIDIA GPU is faster! I've got a 2:6:4:3:16:2:1 GPU!