I hadn't thought about Intel's IGP as a point of comparison - thanks for bringing it up! However, I feel like the thrashing issue could be addressed without expanding capacity to match GT4e (or moving to a scratchpad) by e.g. allowing cache ways to be allocated based on usage - basically Broadwell's L3 cache partitioning (CAT) but for GPUs. (http://danluu.com/intel-cat/, and 17.17 in http://www.intel.com/content/dam/ww...s-software-developer-vol-3b-part-2-manual.pdf has details). I'm curious though - under the assumption that Switch is successful and becomes a specific optimization target for game devs, would you prefer a large cache (or scratchpad), or would you rather have more compute throughput and a smaller cache that requires tailored software to really shine?