Jawed
Legend
Taking GT200 and RV770 as baselines, each GPU dedicates about 30% of the die to ALUs including register file and scheduling. Over the next 18 months these designs will both see a massive increase in computing power. So the question there is, what will be the rate of increase in RBEs and MCs?
We can be reasonably sure that ATI is sticking with 4:1 ALU:TEX for at least one generation and prolly the one after, so TUs will increase in count in proportion to the increase in ALUs.
I'm thinking that over the next 18 months we'll see something like a 4x increase in ALUs in ATI GPUs. RBEs and MCs will not increase by anything like this number, double at most. So the ALUs will be taking a dominant portion of the die.
In NVidia's case I suspect the ALUs won't increase so quickly, but there'll be a radical gain in performance per TMU and per ROP, i.e. as a proportion of the die, the ALUs will increase much like we'll see with ATI.
So ALUs will be ~60% of the die in 18 months? And the generation after that? 70%+?
Assuming a 256-bit bus and fixed function RBEs, are we looking at about a minimum of 25% of the die being fixed function - since I/O consumes a lot of area? Will a 512-bit bus ever become the norm?
So while Larrabee could initially look "unbalanced" being so ALU-heavy (though some of that is scalar core ALUs) we'll start seeing the other GPUs catching up over the next 2 to 3 generations.
Additionally, Larrabee promises to use much less bandwidth due to its tiled pixel shading approach. Perhaps half. That again increases the proportion of Larrabee that is ALUs rather than fixed function.
The other killer, particularly for NVidia, is that Larrabee's true-scalar per element (pixel) ALU design and it's four hardware thread and "dumb context switching" software threading model means that the control overhead on Larrabee's vector ALUs is way way lower than in NVidia's design. Larrabee also uses a 16-lane ALU design (as opposed to NVidia's 8), again radically reducing the control overhead per element.
So, as ALUs naturally become the dominant part of all high-end GPUs, Larrabee will be right near the front in terms of performance per mm for its ALUs.
Also we're expecting the other GPUs to replace fixed function units (e.g. ROPs) with programs running on the ALUs. This will further increase the proportion of the die that is ALUs.
We know that all of NVidia's major functional units, ALUs, TMUs, RBEs and MCs are horribly inefficient per mm right now. I see hope for all but the ALUs, but that'll only come in time for the era when ALUs become dominant. If NVidia is building ALUs that are half the performance per mm of Intel then how's it going to compete?
Finally, no-one in their right mind buys version 1.0 of anything. So, I'm very much looking forward to Larrabee 2. That should be the GPU that dishes out the pain, top to bottom. Can't wait
Jawed
We can be reasonably sure that ATI is sticking with 4:1 ALU:TEX for at least one generation and prolly the one after, so TUs will increase in count in proportion to the increase in ALUs.
I'm thinking that over the next 18 months we'll see something like a 4x increase in ALUs in ATI GPUs. RBEs and MCs will not increase by anything like this number, double at most. So the ALUs will be taking a dominant portion of the die.
In NVidia's case I suspect the ALUs won't increase so quickly, but there'll be a radical gain in performance per TMU and per ROP, i.e. as a proportion of the die, the ALUs will increase much like we'll see with ATI.
So ALUs will be ~60% of the die in 18 months? And the generation after that? 70%+?
Assuming a 256-bit bus and fixed function RBEs, are we looking at about a minimum of 25% of the die being fixed function - since I/O consumes a lot of area? Will a 512-bit bus ever become the norm?
So while Larrabee could initially look "unbalanced" being so ALU-heavy (though some of that is scalar core ALUs) we'll start seeing the other GPUs catching up over the next 2 to 3 generations.
Additionally, Larrabee promises to use much less bandwidth due to its tiled pixel shading approach. Perhaps half. That again increases the proportion of Larrabee that is ALUs rather than fixed function.
The other killer, particularly for NVidia, is that Larrabee's true-scalar per element (pixel) ALU design and it's four hardware thread and "dumb context switching" software threading model means that the control overhead on Larrabee's vector ALUs is way way lower than in NVidia's design. Larrabee also uses a 16-lane ALU design (as opposed to NVidia's 8), again radically reducing the control overhead per element.
So, as ALUs naturally become the dominant part of all high-end GPUs, Larrabee will be right near the front in terms of performance per mm for its ALUs.
Also we're expecting the other GPUs to replace fixed function units (e.g. ROPs) with programs running on the ALUs. This will further increase the proportion of the die that is ALUs.
We know that all of NVidia's major functional units, ALUs, TMUs, RBEs and MCs are horribly inefficient per mm right now. I see hope for all but the ALUs, but that'll only come in time for the era when ALUs become dominant. If NVidia is building ALUs that are half the performance per mm of Intel then how's it going to compete?
Finally, no-one in their right mind buys version 1.0 of anything. So, I'm very much looking forward to Larrabee 2. That should be the GPU that dishes out the pain, top to bottom. Can't wait
Jawed