A next generation ULP Geforce with unified shader architecture and up to 64 CUDA "cores" (clocked at ~ 500MHz) would have up to ~ 8x more peak pixel shader performance relative to ULP Geforce in Tegra 3 (based on having a maximum of 64 pixel shader units on Tegra 4 vs. 8 pixel shader units on Tegra 3).
Well I figured how you came to that assumption, I just wanted you to confirm it. So given that awkward speculative math it'll have according to it either 7 or 9 TMUs.
That's an exact 3.2x increase in pixel shader performance.The ULP Geforce in Tegra 3 has ~ 3-3.5x more peak pixel shader performance relative to ULP Geforce in Tegra 2 (based on having 2x more pixel shader units and ~ 66% higher clock speed on Tegra 3 vs. Tegra 2),
Does it? The fastest T3 device with quite a bit higher bandwidth than typical T3 based devices gives an offscreen result of almost 568MTexels/s while the fastest Tegra2 device from the Kishonti database itself gives 220MTexels/s, which gives a 2.58x difference, whereby if you use a TF300 as a reference the difference shrinks to 2.3x. I'd rather say that both Tegra2 as Tegra3 SoCs have the same amount of TMUs with different frequencies and higher bandwidth on T3 allowing higher fillrate efficiency....and that performance delta seems to be reasonably well reflected in the GLBenchmark Fill Test results.
If yes it breaks your theory quite quickly since there are 2 vs. 1 Vec4 PS ALU between T3 and T2. On top of that I can't know for sure, but I'd be willing to believe that texturing is decoupled from ALUs on ULP GFs (unlike NV3x/NV4x desktop trends), albeit you'll see it on a ARM Mali of the current generation and Lord knows which else architecture.
Wayne in all likeliness will have a revamped GPU architecture, for which I expect a departure from vector ALUs (pretty much for every SFF next generation GPU); else it could very well be SIMD8 or SIMD16, where nothing speaks against the notion that two SIMD8 could share a texture block. For the very least of major changes these are going to be with utmost certainty USC ALUs so it's more like "12SPs" on the T3 ULP GF in total, where a 5x times increase in TMU count is equally senseless.
Floating point performance will explode with the next generation GPUs; fillrates rather not. By the way it didn't ever strike you that despite NV's sw/compiler efficiency 64 GFLOPs of maximum theoretical floating point power are rather pitiful for a design that aims to reach up to clamshells?So just to reiterate, I was looking at the performance delta between Tegra 2 and Tegra 3 on GLBenchmark Fill Test to extrapolate results for Tegra 4.