Sigh, I really post too much off-topic stuff these days but...
First is it very important to dissociate the Maxwell GPU core and the Denver CPU core, which are both used in the first Maxwell-based chip. Not every Maxwell chip will necessarily use Denver. I'm not sure that's the way NVIDIA think about their naming scheme (which in the G9x/GT2xx generation one of their top engineers told someone I know he couldn't keep up with anyway), but I can't find any other way to dissociate the two clearly.
Maxwell's GPU and the system architecture of that first chip are very HPC-oriented, but the Project Denver CPU itself is nearly certainly not. Remember the idea is to run the FP-heavy stuff on the GPU, not the CPU. I'd be very surprised if we had more than a single 128-bit FMA here - which Cortex-A15 already has!
Agreed. I'd expect Denver to be slightly more HPC-oriented than Cortex-A15—otherwise, what's the point?—but not a computing beast by any means. Besides, going after Intel's Rockwell or AMD's Bulldozer 2 in a first attempt at a CPU would be foolish.
As for the GPU, AFAIK the next-generation Tegra GPU is only coming in Logan which is likely slated for late 2012/early 2013 tape-out on 28HPM with 2H13 end-product availability. That will also be the first Tegra with Cortex-A15, as the 2012 Wayne is much more incremental. So the timeframe for next-gen Tegra GPU and the Maxwell GPU is surprisingly not that different, but the former comes up earlier than the latter and is one process node behind.
So I think architectural convergence is very very unlikely, unless it is the Maxwell GPU itself that is a next-gen Tegra GPU derivative, which would be completely crazy but rather in line with Jen-Hsun's insistence that Tegra is the future of the company and that performance will be much more limited by perf/watt than perf/mm² (and already is).
Agreed again. Plus, Maxwell is pretty much bound to move further towards HPC, which really doesn't make sense for the embedded world where every mW counts and you don't really see any need for high-performance floating point computing at all. Putting a Maxwell derivative in Tegra would doom it, considering the power consumption penalty it would generate, vs. the leaner designs that TI, Qualcomm and Samsung will be offering at that time.
As for ARM CPU adoption on PCs... I think there's a strong possibility that many notebooks will evolve towards also having a touchscreen over time. That makes Metro UI and the like more attractive, and significantly reduces the relative appeal of legacy application compatibility. But yeah, desktops? No way. Maybe hell has already frozen over now that Duke Nukem Forever is released, but there's no way desktops are ever switching to ARM. Maybe some niche 'desktop' functions like Windows HTPCs, but that's more likely to migrate towards ARM by moing away away from Windows anyway.
Having fingerprints all over my notebook screen would drive me insane, but maybe that's just me. I'll admit to being slightly neurotic about such things.
Back on topic:
Yeah, 64-bit discrete GPUs are clearly a thing of the past though.
And I shall shed no tears over their demise.
Llano is very impressive, but I wonder how bandwidth limited it really is, I really wish someone benchmarked it with different DDR3 module speeds. If it's very limited, then there may not be much room to grow before DDR4 becomes mainstream, or some other clever trick is used (silicon interposers as rumoured for Intel Haswell?)
According to the first leaked benchmarks, it seems to scale pretty well with overclocking—though maybe they overclocked the RAM as well?—so I'd say there's still some margin for improvement.
Further, AMD has yet to give the GPU access to any kind of shared, last-level cache the way Intel does in Sandy-Bridge. Sure, Llano seems to be doing fine without it, but in the future it's one possible way to mitigate the need for higher memory bandwidth. At this point I would like to mention eDRAM and T-RAM, acknowledging that we've been talking about those (and the now defunct Z-RAM) for a while and that so far, only IBM has used any of them. Still, it might happen.
Finally, as AMD integrates Bulldozer cores into their APUs, it will become possible for them to offer relatively high-end APUs with powerful CPU and GPU cores, possibly justifying the addition of a third memory channel, perhaps on a second platform that would be shared with very high-end CPUs lacking any kind of integrated graphics, leaving the low-end and mid-range APUs to use a more standard a cheaper 2-channel platform. After all, if Intel did it with Nehalem, it's not entirely unreasonable. And obviously, none of these options are mutually exclusive.