Looks like they doubled the TMUs and pixel ROPs, while keeping the depth ROPs (ZOPs?) and triangle setup engine the same.
I've heard that the triangle setup capabilities of GeForce ULP weren't stellar, I wonder if this will be a bottleneck anywhere.. TMUs will probably limit them at least some of the time too - with A6X Apple doubled the ALU:TMU ratio but here they're tripling it. nVidia has also confirmed that Tegra 3 was 2 TMUs (and Tegra 4 is 4 TMUs).
Tegra 4i looks even crazier, with the ALU:TMU ratio increasing a staggering 6x (for fragment shading anyway, but I doubt vertex shading increasing half as much as fragment is going to be much of a bottleneck, especially with the triangle rate not increased)