Saying the P4 needs bandwidth to perform better is misleading. Within this statement is couched an implication to the effect of, the P4 is some how weaker than other processors due to this need. If I was to increase the memory bandwidth for the Athlon we'd see performance improvements as well.
The processor/CPU performance disparity is merely more noticed on the P4 due to its L2 cache. The L2 cache has large cache lines and it's eviction rate is high, so it leans more heavily on the memory sub-system, unless we're talking about streaming programs were eviction of reusable data is less often and prefetching works nicely to hide memory latency. This is why Northwood shows such dramatic improvements.
The current speculation is the Prescott will sport a 1M L3 cache, with a 128-256KB L2 cache. There will be a fair number of transistors left over for an improved floating point unit - the latter is wishful thinking on my part. I suspect there will be improvements made to the trace cache allowing for more decodes per cycle rather than increasing it's capacity. It's speculated that the trace cache is more than 64KB.
As for increasing the cache on the Athlon, the same wouldn't hold true for, actually if they increased the amount of cache, you'd probably see a performance hit due to increased latency and you don't want increased latency on the Athlon's L2 cache - it's already slow enough.
The fact of the matter is, there is a disparity between memory performance and Processor performance and we're seeing it. RDRAM is a technology aimed at addressing this issue. It's unfortunate that it won out over SLDRAM on hype rather than technical merit. Because if it was technical merit, we'd see SLDRAM in most systems.