The A5's implementation of A9 also includes the NEON engine while Tegra 2 doesn't. Tegra 2's VFP implementation is also VFPv3-D16, so the A5 has twice the number of FP registers as Tegra does.
The Cortex A8 core in the A4 with the cache is about as large as the Tegra 2's 2 cores + cache. A9 on TSMC's 40nm is twice as dense as A9 on Samsung's 45nm.