I miss wrote in my previous post, meant 80 tflops for the 256 core mpx card. But the simple prediction here on my end is that we'll likely get 2x the tflops of the 6900 XT in the next gen console machines(If they stay with RDNA and not some new ARM GPU). The Series X is 12tflops compared to 5tflops of the 780 Ti. Thats a 2x increase. Another thing you've missed is that Apple's move to their own ARM chips gives them the ability to have competitive price to performance for their offerings.(Compare the Surface pro x vs the Macbook Air M1). You're going to see ML engineers, software developers, graphics designers and animators seriously investing in Apple hw.
Ah okay, so the 80 TF was for the 256 core card, makes sense. The 6900 XT is around 23 TF of raw performance; I think actual raw TF perf of 10th-gen systems might be a bit less than that, but effective performance will be a lot higher due to architectural improvements and moving to smaller node processes along with newer die-stacking manufacturing techniques. So I gave a ~ 35 - 40 TF target for 10th-gen systems and that still roughly fits into the 2x 6900 XT expectation you have, if not necessarily in terms of pure TF numbers than definitely in effective performance (where again, IMO they'll go well beyond 2x 6900 XT).
It is true that Apple will be able to control pricing better with moving their chip production in-house but, then again, c'mon it's Apple xD. They sell $1,000 stands for their monitors, that's the kind of customer base they've cultivated. Lower production costs will most likely NOT translate into lower MSRPs for Apple products, sadly. Bringing up the Air M1 is interesting though because one of the reasons it's more price-competitive is due to increased competition from, among other companies, Microsoft, and their Surface products. But we'll see how much Apple feels to translate that to the discrete GPU market.
GPUs have much more upside for architectural improvements than CPUs. GPUs have given parallelism and you can get better performance by increasing the number of computational cores or adding accelerators amidst other architecture improvements. GPUs is where we're going to see most of the advancements. 80 tflops isn't a hard number to reach. In 6 to 7 years you won't even consider a 3090 for a graphics card. It will be pretty much pointless to get one yet today it costs over $1000. We'll have cards 3-5 times as powerful.
It's not really about 80 TF being a hard number to reach, it's more about at what cost will that come in terms of die size, production costs, thermal & cooling costs, and weighing the benefits of generic compute vs dedicated hardware acceleration. There's also the question of diminishing returns; we're already getting near photorealistic graphics with PS5 and Series X in real-time, and the gen's only started. Are we really going to need 8x their performance in terms of raw TF in order to genuinely reach photorealism? I personally don't think so. I think the areas of automating data asset generation (through AI programming model training on stuff like GPT-3) along with improving pool capacity of byte-addressable data and the data pipeline/locality efficiency factor more in that case, hence why I think 10th-gen systems will probably prioritize that. Or at least one of them will.
And again, the question of diminishing returns, makes me wonder if simply pushing yet more powerful consoles is even going to fly in seven years' time. That's where I'm thinking some standardized focus on VR & AR comes into the picture and factors into the system design. Yes there's a (very small IMO) risk of pulling a Kinect 2, but we're already getting pretty cheap VR headsets at good refresh and resolution rates, that will only continue to improve and if we can eventually get pretty good, wireless-capable VR/AR headsets at the price of a 1P controller (or only slightly more) and with the production costs to match then there's no reason not to standardize VR/AR with 10th-gen systems. That could have a perceptibly much bigger impact with the masses and even hardcore/core gamers because I think it's only with standardizing VR/AR in mainstream consoles that you'll get a regular, serious flow of AAA 1P games genuinely focusing core aspects of their game design around the tech.
The biggest benefit of NVRAM is energy savings and secondly persisting data. But RAM doesn't need to persist data to work. All changes can be stored on the disk. But the biggest issue is, for the foreseeable future(10 years) you won't be getting anywhere near the memory bandwidth of DRAM from "NVRAM". Maybe in the future. On the other hand, multi core CPUs are just utilizing computing parallelism to increase performance due to bottlenecks from trying other architectural improvements. When you compare that to using NVRAM, it's different.
I wouldn't say those are the only major benefits of NVRAM; again I have to bring up the magnitudes better P/E cycle ratings vs. NAND, significantly lower latencies, and byte-addressability for reads & writes. All that while providing energy savings; the non-volatility honestly takes a backseat in that regard because if you talk in terms of cold storage we might as well use the SSDs for that. Technically speaking, you're right; NVRAM will never have the bandwidth of DRAM. But, that's not really the point either IMHO. Some configurations of Intel's Optane DC Persistent Memory with dual-socket server units and six-channel setups can provide up to 40 GB/s bandwidth on read operations. We won't see SSDs based on NAND hitting that for the next several years, if even this decade. And, that is IIRC Gen 1 Optane DC Persistent Memory; Intel will surely be improving that for newer designs (even if those are not aimed for the consumer market, sadly).
I bring that up because if you compare the bandwidth 40 GB/s is already more than a single DDR4 3200 MHz module; if they keep improving the tech over the next several years hitting 75 GB/s or even higher is not out of the realm of possibility. Only big issue is the DIMM interface; kind of a waste for the performance you get even if the capacities are much better than DRAM. Moving the interface to something like PCIe 5.0 or 6.0 with CXL layered into it would probably be a better suit for the tech, if either Intel, Micron or other companies investing in NVRAM (Everspin maybe?) delve deeper in that area.
For HPC(where you have larger budgets and flexibility) you can get marginal benefits with NVRAM but for consoles for the next generation you're almost certainly not getting NVRAM between the SSD and DRAM. Simply having much faster SSDs, more RAM and higher memory bandwidth to support all the accelerators is the most important thing. Much faster SSDs are a given we should expect that, more RAM is a given a minimum of 32GB and the memory bandwidth should at least double. The rest of expenditure should go to the processors.
The question is how MUCH higher will memory bandwidths go? I've done some calculations for future DDR, GDDR, HBM memories just taking a look at gains between previous generations for those memories, and I don't think you can reach something at a price targeting a console design (let alone PCB real estate, thermals etc.) than 1.5 TB/s to 1.7 TB/s, and that's with other memory sub-systems (aside from caches) factored in. I mean, those are still really good bandwidth increases over PS5 & Series X, but would it really be enough is my question.
Going by your hypothetical, let's take Series X and say the bandwidth increases to 1.2 TB/s, there's no NVRAM, we give it 64 GB HBM3 or so, and it's got a 16 GB/s SSD. But, TF performance is now 80 TF. You're only averaging 15 GB/s per TF; that's going to be a massive hit to any operations requiring good bandwidth throughput, and that's also with needing to remember this would be a hUMA design. Do you rely on on-chip cache, then? What about capacities for the cache, how big or small will they be? Because if there's a cache miss, the penalty for going into the HBM3 will be absolutely massive with that type of setup, IMO.
** = Or going more with a 40 TF design in that same hypothetical example, you get 30 GB/s per TF; quite better but not that much better vs. high-end dedicated GPU cards out today, and again there's bandwidth contention due to the hUMA design that those cards don't deal with. Cache miss penalty cost is reduced but ratio-wise it's still lower than the current-gen PS5 and Series X, so some additional emphasis on cache sizes would have to come into play I think.