Sony PS6, Microsoft neXt Series - 10th gen console speculation [2020]

rntongo · Feb 13, 2021

zed said:
OT: you're forgetting the mining etc
1300 MW nuclear per year to supply the fuel requires 38,000 tonnes CO2 emissions
1300 MW solar is ~20,000 tonnes CO2 (one time cost not per year like nuclear)

But sure nuclear is a lot cleaner than fossil fuels

how did you arrive at these figures?

Tkumpathenurple · Feb 13, 2021

DSoup said:
Nuclear fusion?

Well, I am a nuclear fusion fanboy, so it'd be great for me if the thread went that direction.

Kaotik said:
We can thank all the "green" activists around the world for it. Without their misinformed campaigns, nuclear power would have replaced a lot more fossil fuel plants around the world.

Isn't it fair to say that Chernobyl and Fukushima have played a large role there? Sure, generally speaking, nuclear power is very safe. But when disaster strikes, it's catastrophic. Everything I've read for and against nuclear has only ever served to leave me conflicted.

Although if it meant my PS6 could follow me around like Codsworth, that conflict would probably wane. And then, since my girlfriend would probably leave me, I'd have to buy fleshlight and butt plug attachments...

Nuclear, you son of a bitch, I'm in.

rntongo · Feb 13, 2021

Tkumpathenurpahl said:
Well, I am a nuclear fusion fanboy, so it'd be great for me if the thread went that direction.

Isn't it fair to say that Chernobyl and Fukushima have played a large role there? Sure, generally speaking, nuclear power is very safe. But when disaster strikes, it's catastrophic. Everything I've read for and against nuclear has only ever served to leave me conflicted.

Thats why the US started the IFR program it had all the safety measures to ensure that Chernobyl or Fukushima wouldn't be a thing. Funny thing is they tested their reactor a month before Chernobyl and over 20 years before Fukushima. But the Clinton administration led by John Kerry shut down the program. Now GE Hitachi is restarting it with their PRISM reactor. A sodium cooled fast reactor.

tunafish · Feb 13, 2021

Kaotik said:
ps. what's up with the gigantic towers in US plants?

They are cooling towers. They require a lot less water for the same amount of cooling than using a heat-exchanger in a river like the Finnish plants, but are more expensive to build. Both methods are used in the US (and elsewhere), depending on largely how expensive diverting water for cooling is. As an interesting aside, because the Finnish nukes cool directly using river water, their peak electrical output depends substantially on the temperature of the water in the river. This is considered a feature, not a bug, given that electricity use is much higher during winter than summer.

The funny part are the idiots who see a cooling tower and think nuclear. In Germany, there was once an anti-nuke rally picketing the cooling towers ... of a steel mill.

AzBat · Feb 13, 2021

Where's my Dilithium technology? I'm ready to go to other star systems.

Tommy McClain

Deleted member 11852 · Feb 13, 2021

AzBat said:
Where's my Dilithium technology? I'm ready to go to other star systems.

Remember Praxis :yep2:

rntongo · Feb 16, 2021

Otherwise if we're to go back to the main topic of discussion, if you look at an 8x growth in GPU performance from the 8th gen to 9th gen in the most powerful GPUs(PS4 to Series X), and a 3x improvement in CPU performance, these figures don't look unrealistic for the 10th gen. Probably the lowest gains are going to come from the disk I/O throughput. But in terms of real improvements the SSDs will probably lead to real instant loading times across the board(Not what we're seeing at the start of the 9th gen).

thicc_gaf · Feb 16, 2021

rntongo said:
Otherwise if we're to go back to the main topic of discussion, if you look at an 8x growth in GPU performance from the 8th gen to 9th gen in the most powerful GPUs(PS4 to Series X), and a 3x improvement in CPU performance, these figures don't look unrealistic for the 10th gen. Probably the lowest gains are going to come from the disk I/O throughput. But in terms of real improvements the SSDs will probably lead to real instant loading times across the board(Not what we're seeing at the start of the 9th gen).

Hmm, I dunno if it'll be an 8x growth for GPU in terms of raw numbers. Where would they even find the memory to feed that type of monster, and at what configuration and amount? I don't see a path forward for 80 TF/90 TF consoles for 10th-gen, and I don't think such consoles would really be a big hit anyway, it'd basically be more of the same, just more powerful.

However if you mean effectively 8x GPU performance when factoring in new architecture designs, refinement in things like RT, DLSS/Super Resolution/AI to eat up much less (if any) generic GPU compute costs (by moving those things to hardware accelerators), etc., then I can definitely see that happening. Because I think things will probably move forward to favor dedicated hardware acceleration and more focus for on-chip cache memory (plus hopefully standardizing VR & AR hardware as a default).

rntongo · Feb 17, 2021

thicc_gaf said:
Hmm, I dunno if it'll be an 8x growth for GPU in terms of raw numbers. Where would they even find the memory to feed that type of monster, and at what configuration and amount? I don't see a path forward for 80 TF/90 TF consoles for 10th-gen, and I don't think such consoles would really be a big hit anyway, it'd basically be more of the same, just more powerful.

However if you mean effectively 8x GPU performance when factoring in new architecture designs, refinement in things like RT, DLSS/Super Resolution/AI to eat up much less (if any) generic GPU compute costs (by moving those things to hardware accelerators), etc., then I can definitely see that happening. Because I think things will probably move forward to favor dedicated hardware acceleration and more focus for on-chip cache memory (plus hopefully standardizing VR & AR hardware as a default).

I think we have to wait and see. GPUs with 80 tflops of shader tflops in 10th gen don't sound ridiculous when you look at cards from the past. The GTX 690 released around 4 years before the One X with 3tflops but is an inferior GPU that cost $999. Most likely the RTX 3090 and 6900 XT won't be worth looking at in 2026. There will be much more powerful GPUs with much better accelerators for ML and RT. Not so sure about the benefits of significantly increasing on chip cache especially for GPUs similarly to the Infinity Cache. 32MB of L3 cache sounds reasonable but 128MB is just expensive. With HBM memory the processors won't be bandwidth starved. The RTX 3090 doesn't even have an L3 cache and only 4MB of L2 cache and its the best performing GPU on the techpowerup benchmarks. Increasing the size of the CPU on chip memory is definitely the way to go. A larger unified L3 cache like in Zen 3 and larger L2 cache.

I think 10th gen is when we'll have photorealistic graphics in sports, fighting and racing games.

thicc_gaf · Feb 18, 2021

rntongo said:
I think we have to wait and see. GPUs with 80 tflops of shader tflops in 10th gen don't sound ridiculous when you look at cards from the past. The GTX 690 released around 4 years before the One X with 3tflops but is an inferior GPU that cost $999. Most likely the RTX 3090 and 6900 XT won't be worth looking at in 2026. There will be much more powerful GPUs with much better accelerators for ML and RT. Not so sure about the benefits of significantly increasing on chip cache especially for GPUs similarly to the Infinity Cache. 32MB of L3 cache sounds reasonable but 128MB is just expensive. With HBM memory the processors won't be bandwidth starved. The RTX 3090 doesn't even have an L3 cache and only 4MB of L2 cache and its the best performing GPU on the techpowerup benchmarks. Increasing the size of the CPU on chip memory is definitely the way to go. A larger unified L3 cache like in Zen 3 and larger L2 cache.

I think 10th gen is when we'll have photorealistic graphics in sports, fighting and racing games.

I mean, I agree with us having photorealistic graphics in many games 10th gen, but exactly how much processing power do consoles need to achieve that in real-time at 60 FPS? Is there any sources I can look at for 3D film animation rendering farms, to get an idea for how many systems (and what their specs are) are needed for rendering a single frame in a CG film and how long that render time takes?

More specifically, any sources with this type of info for maybe mid-2000s CG films would be useful because I think that's where 10th-gen systems would land in terms of fidelity, IQ etc. for real-time native 4K 60 FPS gaming. I also should probably look into what the top-end GPUs were in 2013, that could also help give an idea for what we could expect 10th-gen systems to land at power-wise. I still think on-chip cache will be a priority because if you're really talking 60 TF - 80 TF 10th-gen systems you will likely need a lot of HBM stacks and a large active interposer, that will eat a lot into costs and thermals regardless.

thicc_gaf · Feb 18, 2021

Just did a quick look; 780 Ti was the top-line GPU for 2013 with 5.35 TF and 3 GB of memory, 336 GB/s bandwidth. In terms of pure TF, PS4 Pro 3 years later still came up short of that, as well as bandwidth, but still had memory capacity advantage. One X managed to surpass it (again just talking pure TF numbers here), but with very slightly lower memory bandwidth but much more memory capacity.

So another three years from Pro and One X we get PS5 with raw TF #s just shy of 2x 780 Ti, and Series X a bit over 2x 780 Ti. PS5 with an extra 25% memory bandwidth and over 5x memory capacity, Series X with 66% more memory bandwidth (for the GPU-optimized pool), over 5x memory capacity. So here's a thing: assuming 10th-gen are 2027, I guess we can see raw TF figures of around 70 TF (2x 3090), but I don't think RAM capacity will increase 5x unless NVRAM is involved. Bandwidth among total system should be around 1.5 TB/s, probably up to 1.7 TB/s for at least one of them, but you may NOT have to specifically use HBM to get there (I've been doing a lot of theory-crafting for 10th-gen systems in spare time, hope to post some more in this thread very soon).

My only concern there is bandwidth of memory to TF; even at the higher 1.7 TB/s range if the GPUs are around 70 TF that's only 24.28 GB/s per TF, almost 50% less than what PS5 and Series X can offer. That's a reason why I'm thinking on-chip cache memory will have to be a big focus in any case, and that's combined with HBM3(or 4)/GDDR7/DDR6/NVRAM etc. Gotta feed that beast especially if hardware accelerators are involved.

But it's because of potential hardware accelerators why I was thinking pure TF count would be notably lower than 70 TF; they have a limited budget and I think they would still want a per-TF bandwidth ratio that's at least on par with 9th-gen systems. A 70 TF GPU even on say 3nm or 2nm would probably be very expensive and relatively wide. So there is going to be an inherent conflict between maintaining a decent per-TF bandwidth and keeping chip size manageable not just for heat/power reasons but also cost-related ones, because they'll still need to afford budget for CPU, audio, storage, (potentially) NVRAM, etc. (including what I'm personally hoping for as at least one of the system having VR/AR headset standard in every SKU).

Those factors are why I'm thinking TF-wise, 10th-gen systems will probably land between 35 TF - 40 TF in terms of the pure number, but still be way more performant than a 6900 XT or 3090 because of much better hardware accelerators, better memory, more on-chip cache, faster caches, more memory bandwidth, more memory capacity (probably only 2x main memory but over 5x if NVRAM is added into the picture), etc.

rntongo · Feb 18, 2021

thicc_gaf said:
Just did a quick look; 780 Ti was the top-line GPU for 2013 with 5.35 TF and 3 GB of memory, 336 GB/s bandwidth. In terms of pure TF, PS4 Pro 3 years later still came up short of that, as well as bandwidth, but still had memory capacity advantage. One X managed to surpass it (again just talking pure TF numbers here), but with very slightly lower memory bandwidth but much more memory capacity.

So another three years from Pro and One X we get PS5 with raw TF #s just shy of 2x 780 Ti, and Series X a bit over 2x 780 Ti. PS5 with an extra 25% memory bandwidth and over 5x memory capacity, Series X with 66% more memory bandwidth (for the GPU-optimized pool), over 5x memory capacity. So here's a thing: assuming 10th-gen are 2027, I guess we can see raw TF figures of around 70 TF (2x 3090), but I don't think RAM capacity will increase 5x unless NVRAM is involved. Bandwidth among total system should be around 1.5 TB/s, probably up to 1.7 TB/s for at least one of them, but you may NOT have to specifically use HBM to get there (I've been doing a lot of theory-crafting for 10th-gen systems in spare time, hope to post some more in this thread very soon).

My only concern there is bandwidth of memory to TF; even at the higher 1.7 TB/s range if the GPUs are around 70 TF that's only 24.28 GB/s per TF, almost 50% less than what PS5 and Series X can offer. That's a reason why I'm thinking on-chip cache memory will have to be a big focus in any case, and that's combined with HBM3(or 4)/GDDR7/DDR6/NVRAM etc. Gotta feed that beast especially if hardware accelerators are involved.

But it's because of potential hardware accelerators why I was thinking pure TF count would be notably lower than 70 TF; they have a limited budget and I think they would still want a per-TF bandwidth ratio that's at least on par with 9th-gen systems. A 70 TF GPU even on say 3nm or 2nm would probably be very expensive and relatively wide. So there is going to be an inherent conflict between maintaining a decent per-TF bandwidth and keeping chip size manageable not just for heat/power reasons but also cost-related ones, because they'll still need to afford budget for CPU, audio, storage, (potentially) NVRAM, etc. (including what I'm personally hoping for as at least one of the system having VR/AR headset standard in every SKU).

Those factors are why I'm thinking TF-wise, 10th-gen systems will probably land between 35 TF - 40 TF in terms of the pure number, but still be way more performant than a 6900 XT or 3090 because of much better hardware accelerators, better memory, more on-chip cache, faster caches, more memory bandwidth, more memory capacity (probably only 2x main memory but over 5x if NVRAM is added into the picture), etc.

The part I bolded; Thats why I was saying 80 Tflops is possible on 10th gen consoles. In 6-7 years there'll be much better GPUs. Just next year we'll have 128 core Apple GPU(It would have over 80 tflops of fp32 compute but still use much less power than a 3090) and probably a 256 core GPU MPX card for serious workloads. So in 6-7 years you're looking at seriously powerful 10th gen machines. The most expensive thing will likely be memory. I think they'll have to go between 32-64GB. SSD won't be as expensive in terms of total cost of the machine so focus will be on the processors and RAM.

thicc_gaf said:
My only concern there is bandwidth of memory to TF; even at the higher 1.7 TB/s range if the GPUs are around 70 TF that's only 24.28 GB/s per TF, almost 50% less than what PS5 and Series X can offer. That's a reason why I'm thinking on-chip cache memory will have to be a big focus in any case, and that's combined with HBM3(or 4)/GDDR7/DDR6/NVRAM etc. Gotta feed that beast especially if hardware accelerators are involved

I don't see engineers going for NVRAM when they can simply increase the size of volatile RAM(32-64GB) and have a much faster SSD(12GB/s>). It's easier to program for and would provide the same performance.

HBM is definitely the way to go and you want to get as much as possible so minimum 32GB. Thats why I think it's going to be either the highest or second highest cost of the console.

You can see currently the XSX can instantly boot into a game that was most recently resident in RAM. No need for NVRAM. Users would also have much larger storage. Consoles could ship with 2TB maybe even higher by then. And if there's anything to learn from the Xbox One vs PS4 its that its always a win if you can increase memory bandwidth instead of increasing the size of on chip memory. IIRC for example CPU caches at a certain size for example 8MB L2 or L3 cache becomes optimal in that you get very few cache misses. There are diminishing returns as you increase the size of caches. You'd need to increase them significantly but it would be too expensive so you'd end up increasing size of RAM. Smarter things like unified L3 cache like in Zen 3 are what we should expect to see in 10th gen.

For the memory bandwidth I think they'll simply see what developers do with the current hw and see how much memory bandwidth they use. Otherwise a ~2TB/s memory bandwidth target doesn't look unrealistic.

thicc_gaf · Feb 19, 2021

rntongo said:
The part I bolded; Thats why I was saying 80 Tflops is possible on 10th gen consoles. In 6-7 years there'll be much better GPUs. Just next year we'll have 128 core Apple GPU(It would have over 80 tflops of fp32 compute but still use much less power than a 3090) and probably a 256 core GPU MPX card for serious workloads. So in 6-7 years you're looking at seriously powerful 10th gen machines. The most expensive thing will likely be memory. I think they'll have to go between 32-64GB. SSD won't be as expensive in terms of total cost of the machine so focus will be on the processors and RAM.

But the thing is the 780 Ti came out in 2013; a system like the Series X is a bit over 2x its performance in raw TF numbers but it took seven whole years for that to come to market. GPUs had a lot of major architectural advances last decade and while I think they're going to see a lot this decade too, I personally don't think it'll be to the same level and that's with things like chiplets factored in. Mainly because there will be a learning process for many things, and I also think GPU designers will put more focus into embedded hardware accelerators increasingly as the decade goes on.

Personally wouldn't use Apple as too much a gauge in this; we have no idea where that hypothetical 128-core Apple GPU will land price-wise but we know it'll be monstrously expensive because even their more standard stuff is priced extremely high (some would say overpriced). Looking at M1's GPU perf it looks to be around 2.6 TF with 8 cores; just scaling that upwards would put a 128 core variant at "just" 41.6 TF, not 80 TF. Tho 41.6 TF would be very impressive, and Intel have plans for high-end Xe GPUs to provide similar performance. Still though, it's Apple; whatever their high-end GPU for next year hits at perf-wise, it will likely be priced north of even Nvidia's high-end cards, but it also falls outside of the time frame relative to PS5 & Series X launches so I'm not necessarily sure how it's their 780 Ti equivalent.

We're going to get extremely powerful 10th-gen systems regardless, but I think you and I have differing opinions on how exactly that'll happen. Extremely high/powerful "generic" GPU compute compared to more modest generic GPU compute but combined with lots of hardware accelerators. IMO tho the latter just feels like it'd work better for console designs because you can get that relative performance in a smaller chip which means lower costs, something Sony and Microsoft are always going to want to prioritize.

I don't see engineers going for NVRAM when they can simply increase the size of volatile RAM(32-64GB) and have a much faster SSD(12GB/s>). It's easier to program for and would provide the same performance.

On one hand you're right it would definitely be easier to program for, but at one point single-core CPUs were also easier to program for and offered same or better performance to early multi-core CPUs since they could push higher clocks. In the long-term though, multi-core allowed a paradigm shift that has ultimately been for the best. I honestly think NVRAM will provide similar because you still get byte-addressable memory with way lower latencies than NAND and at capacities much higher than RAM, along with persistent data presence...

...depending on how the NVRAM would be used. Optane for example, they actually have memory mode and app-direct mode and the former just treats the Optane as a last-level cache. Programmers don't have to change any code, but it's arguably less use of NVRAM's potential compared to App-Direct mode. Treating NVRAM in a way more comparable to what App-Direct provides, of course it's going to have a learning curve, but never underestimate game devs' ability to come to terms with new tech.

For things like higher-level on-demand asset data streaming that can also be used as a larger direct memory cache footprint, SSDs have their peak limitations because of the limitations of NAND technology, and I don't think RAM prices will fall through the floor over the next few years to the point where 64 GB (I agree that 32 GB should be present with 10th-gen systems though) is either really doable or desirable compared to what advantages somewhat lesser RAM capacity but a decent bit of NVRAM capacity can bring. The key with the NVRAM though would rely on it being leveraged more like memory instead of storage, and a means of interfacing that doesn't "waste" it like its DRAM, because it doesn't have quite that level of latency and probably never will, but it can get damn close.

I'm personally thinking this is where something like PCIe with CXL layered on top can come in, but there are other ways to push NVRAM forward too.

HBM is definitely the way to go and you want to get as much as possible so minimum 32GB. Thats why I think it's going to be either the highest or second highest cost of the console.

Yeah memory tends to usually be a massive part of the BOM costs, that will likely not change for 10th-gen.

You can see currently the XSX can instantly boot into a game that was most recently resident in RAM. No need for NVRAM. Users would also have much larger storage. Consoles could ship with 2TB maybe even higher by then. And if there's anything to learn from the Xbox One vs PS4 its that its always a win if you can increase memory bandwidth instead of increasing the size of on chip memory. IIRC for example CPU caches at a certain size for example 8MB L2 or L3 cache becomes optimal in that you get very few cache misses. There are diminishing returns as you increase the size of caches. You'd need to increase them significantly but it would be too expensive so you'd end up increasing size of RAM. Smarter things like unified L3 cache like in Zen 3 are what we should expect to see in 10th gen.

For the memory bandwidth I think they'll simply see what developers do with the current hw and see how much memory bandwidth they use. Otherwise a ~2TB/s memory bandwidth target doesn't look unrealistic.

The thing I am hoping to get across though is NVRAM wouldn't simply be fore more storage or faster storage; it's about having a larger memory pool with the chief benefits of volatile memory (bandwidth, latency, byte-addressability, endurance) that SSD storage will never be able to provide. A memory that can sit between RAM and SSD. If you can get 128 GB of NVRAM for the same cost as 32 GB HBM, suddenly 64 GB HBM starts to look questionable compared to 32 GB HBM + 128 GB NVRAM if the former comes with a price premium and (potentially) not a 2x bandwidth increase over 32 (for example, suppose a future HBM spec increases per-module capacity in the stack and that's what MS or Sony have to go with in order to keep prices manageable as well as thermals and packaging space).

Ultimately I guess in terms of NVRAM/no NVRAM and how that goes going forward will come down to what we think AAA games and gaming as a larger whole will trend design-wise. I'm of the opinion that eventually things like VR and AR, full-body motion feedback etc. will become standardized with 10th-gen as well as AI-assisted data streaming and programming models. Those things will favor larger byte-addressable memory blocks, and if consoles want to provide that memory in a way that's cost-effective while also bein performance-effective IMHO they need some mixture of RAM & NVRAM because, as you even say, increasing cache sizes more and more eventually becomes cost and performance-prohibitive (I should've maybe clarified with my on-chip memory statements previously that I was thinking more in terms of making the cache sub-systems interact more intelligently with each other, not merely increasing the capacities. Though some small capacity increases could be had while minimizing cache misses, and features like cache scrubbers do aid in some capacity to cutting down misses, that sort of stuff will likely become standard going forward IMHO).

rntongo · Feb 19, 2021

thicc_gaf said:
But the thing is the 780 Ti came out in 2013; a system like the Series X is a bit over 2x its performance in raw TF numbers but it took seven whole years for that to come to market. GPUs had a lot of major architectural advances last decade and while I think they're going to see a lot this decade too, I personally don't think it'll be to the same level and that's with things like chiplets factored in. Mainly because there will be a learning process for many things, and I also think GPU designers will put more focus into embedded hardware accelerators increasingly as the decade goes on.

Personally wouldn't use Apple as too much a gauge in this; we have no idea where that hypothetical 128-core Apple GPU will land price-wise but we know it'll be monstrously expensive because even their more standard stuff is priced extremely high (some would say overpriced). Looking at M1's GPU perf it looks to be around 2.6 TF with 8 cores; just scaling that upwards would put a 128 core variant at "just" 41.6 TF, not 80 TF. Tho 41.6 TF would be very impressive, and Intel have plans for high-end Xe GPUs to provide similar performance. Still though, it's Apple; whatever their high-end GPU for next year hits at perf-wise, it will likely be priced north of even Nvidia's high-end cards, but it also falls outside of the time frame relative to PS5 & Series X launches so I'm not necessarily sure how it's their 780 Ti equivalent.

I miss wrote in my previous post, meant 80 tflops for the 256 core mpx card. But the simple prediction here on my end is that we'll likely get 2x the tflops of the 6900 XT in the next gen console machines(If they stay with RDNA and not some new ARM GPU). The Series X is 12tflops compared to 5tflops of the 780 Ti. Thats a 2x increase. Another thing you've missed is that Apple's move to their own ARM chips gives them the ability to have competitive price to performance for their offerings.(Compare the Surface pro x vs the Macbook Air M1). You're going to see ML engineers, software developers, graphics designers and animators seriously investing in Apple hw.

GPUs have much more upside for architectural improvements than CPUs. GPUs have given parallelism and you can get better performance by increasing the number of computational cores or adding accelerators amidst other architecture improvements. GPUs is where we're going to see most of the advancements. 80 tflops isn't a hard number to reach. In 6 to 7 years you won't even consider a 3090 for a graphics card. It will be pretty much pointless to get one yet today it costs over $1000. We'll have cards 3-5 times as powerful.

thicc_gaf said:
On one hand you're right it would definitely be easier to program for, but at one point single-core CPUs were also easier to program for and offered same or better performance to early multi-core CPUs since they could push higher clocks. In the long-term though, multi-core allowed a paradigm shift that has ultimately been for the best. I honestly think NVRAM will provide similar because you still get byte-addressable memory with way lower latencies than NAND and at capacities much higher than RAM, along with persistent data presence...

...depending on how the NVRAM would be used. Optane for example, they actually have memory mode and app-direct mode and the former just treats the Optane as a last-level cache. Programmers don't have to change any code, but it's arguably less use of NVRAM's potential compared to App-Direct mode. Treating NVRAM in a way more comparable to what App-Direct provides, of course it's going to have a learning curve, but never underestimate game devs' ability to come to terms with new tech.

The biggest benefit of NVRAM is energy savings and secondly persisting data. But RAM doesn't need to persist data to work. All changes can be stored on the disk. But the biggest issue is, for the foreseeable future(10 years) you won't be getting anywhere near the memory bandwidth of DRAM from "NVRAM". Maybe in the future. On the other hand, multi core CPUs are just utilizing computing parallelism to increase performance due to bottlenecks from trying other architectural improvements. When you compare that to using NVRAM, it's different. For HPC(where you have larger budgets and flexibility) you can get marginal benefits with NVRAM but for consoles for the next generation you're almost certainly not getting NVRAM between the SSD and DRAM. Simply having much faster SSDs, more RAM and higher memory bandwidth to support all the accelerators is the most important thing. Much faster SSDs are a given we should expect that, more RAM is a given a minimum of 32GB and the memory bandwidth should at least double. The rest of expenditure should go to the processors.

thicc_gaf · Feb 19, 2021

rntongo said:
I miss wrote in my previous post, meant 80 tflops for the 256 core mpx card. But the simple prediction here on my end is that we'll likely get 2x the tflops of the 6900 XT in the next gen console machines(If they stay with RDNA and not some new ARM GPU). The Series X is 12tflops compared to 5tflops of the 780 Ti. Thats a 2x increase. Another thing you've missed is that Apple's move to their own ARM chips gives them the ability to have competitive price to performance for their offerings.(Compare the Surface pro x vs the Macbook Air M1). You're going to see ML engineers, software developers, graphics designers and animators seriously investing in Apple hw.

Ah okay, so the 80 TF was for the 256 core card, makes sense. The 6900 XT is around 23 TF of raw performance; I think actual raw TF perf of 10th-gen systems might be a bit less than that, but effective performance will be a lot higher due to architectural improvements and moving to smaller node processes along with newer die-stacking manufacturing techniques. So I gave a ~ 35 - 40 TF target for 10th-gen systems and that still roughly fits into the 2x 6900 XT expectation you have, if not necessarily in terms of pure TF numbers than definitely in effective performance (where again, IMO they'll go well beyond 2x 6900 XT).

It is true that Apple will be able to control pricing better with moving their chip production in-house but, then again, c'mon it's Apple xD. They sell $1,000 stands for their monitors, that's the kind of customer base they've cultivated. Lower production costs will most likely NOT translate into lower MSRPs for Apple products, sadly. Bringing up the Air M1 is interesting though because one of the reasons it's more price-competitive is due to increased competition from, among other companies, Microsoft, and their Surface products. But we'll see how much Apple feels to translate that to the discrete GPU market.

GPUs have much more upside for architectural improvements than CPUs. GPUs have given parallelism and you can get better performance by increasing the number of computational cores or adding accelerators amidst other architecture improvements. GPUs is where we're going to see most of the advancements. 80 tflops isn't a hard number to reach. In 6 to 7 years you won't even consider a 3090 for a graphics card. It will be pretty much pointless to get one yet today it costs over $1000. We'll have cards 3-5 times as powerful.

It's not really about 80 TF being a hard number to reach, it's more about at what cost will that come in terms of die size, production costs, thermal & cooling costs, and weighing the benefits of generic compute vs dedicated hardware acceleration. There's also the question of diminishing returns; we're already getting near photorealistic graphics with PS5 and Series X in real-time, and the gen's only started. Are we really going to need 8x their performance in terms of raw TF in order to genuinely reach photorealism? I personally don't think so. I think the areas of automating data asset generation (through AI programming model training on stuff like GPT-3) along with improving pool capacity of byte-addressable data and the data pipeline/locality efficiency factor more in that case, hence why I think 10th-gen systems will probably prioritize that. Or at least one of them will.

And again, the question of diminishing returns, makes me wonder if simply pushing yet more powerful consoles is even going to fly in seven years' time. That's where I'm thinking some standardized focus on VR & AR comes into the picture and factors into the system design. Yes there's a (very small IMO) risk of pulling a Kinect 2, but we're already getting pretty cheap VR headsets at good refresh and resolution rates, that will only continue to improve and if we can eventually get pretty good, wireless-capable VR/AR headsets at the price of a 1P controller (or only slightly more) and with the production costs to match then there's no reason not to standardize VR/AR with 10th-gen systems. That could have a perceptibly much bigger impact with the masses and even hardcore/core gamers because I think it's only with standardizing VR/AR in mainstream consoles that you'll get a regular, serious flow of AAA 1P games genuinely focusing core aspects of their game design around the tech.

The biggest benefit of NVRAM is energy savings and secondly persisting data. But RAM doesn't need to persist data to work. All changes can be stored on the disk. But the biggest issue is, for the foreseeable future(10 years) you won't be getting anywhere near the memory bandwidth of DRAM from "NVRAM". Maybe in the future. On the other hand, multi core CPUs are just utilizing computing parallelism to increase performance due to bottlenecks from trying other architectural improvements. When you compare that to using NVRAM, it's different.

I wouldn't say those are the only major benefits of NVRAM; again I have to bring up the magnitudes better P/E cycle ratings vs. NAND, significantly lower latencies, and byte-addressability for reads & writes. All that while providing energy savings; the non-volatility honestly takes a backseat in that regard because if you talk in terms of cold storage we might as well use the SSDs for that. Technically speaking, you're right; NVRAM will never have the bandwidth of DRAM. But, that's not really the point either IMHO. Some configurations of Intel's Optane DC Persistent Memory with dual-socket server units and six-channel setups can provide up to 40 GB/s bandwidth on read operations. We won't see SSDs based on NAND hitting that for the next several years, if even this decade. And, that is IIRC Gen 1 Optane DC Persistent Memory; Intel will surely be improving that for newer designs (even if those are not aimed for the consumer market, sadly).

I bring that up because if you compare the bandwidth 40 GB/s is already more than a single DDR4 3200 MHz module; if they keep improving the tech over the next several years hitting 75 GB/s or even higher is not out of the realm of possibility. Only big issue is the DIMM interface; kind of a waste for the performance you get even if the capacities are much better than DRAM. Moving the interface to something like PCIe 5.0 or 6.0 with CXL layered into it would probably be a better suit for the tech, if either Intel, Micron or other companies investing in NVRAM (Everspin maybe?) delve deeper in that area.

For HPC(where you have larger budgets and flexibility) you can get marginal benefits with NVRAM but for consoles for the next generation you're almost certainly not getting NVRAM between the SSD and DRAM. Simply having much faster SSDs, more RAM and higher memory bandwidth to support all the accelerators is the most important thing. Much faster SSDs are a given we should expect that, more RAM is a given a minimum of 32GB and the memory bandwidth should at least double. The rest of expenditure should go to the processors.

The question is how MUCH higher will memory bandwidths go? I've done some calculations for future DDR, GDDR, HBM memories just taking a look at gains between previous generations for those memories, and I don't think you can reach something at a price targeting a console design (let alone PCB real estate, thermals etc.) than 1.5 TB/s to 1.7 TB/s, and that's with other memory sub-systems (aside from caches) factored in. I mean, those are still really good bandwidth increases over PS5 & Series X, but would it really be enough is my question.

Going by your hypothetical, let's take Series X and say the bandwidth increases to 1.2 TB/s, there's no NVRAM, we give it 64 GB HBM3 or so, and it's got a 16 GB/s SSD. But, TF performance is now 80 TF. You're only averaging 15 GB/s per TF; that's going to be a massive hit to any operations requiring good bandwidth throughput, and that's also with needing to remember this would be a hUMA design. Do you rely on on-chip cache, then? What about capacities for the cache, how big or small will they be? Because if there's a cache miss, the penalty for going into the HBM3 will be absolutely massive with that type of setup, IMO.

** = Or going more with a 40 TF design in that same hypothetical example, you get 30 GB/s per TF; quite better but not that much better vs. high-end dedicated GPU cards out today, and again there's bandwidth contention due to the hUMA design that those cards don't deal with. Cache miss penalty cost is reduced but ratio-wise it's still lower than the current-gen PS5 and Series X, so some additional emphasis on cache sizes would have to come into play I think.

rntongo · Feb 21, 2021

thicc_gaf said:
Ah okay, so the 80 TF was for the 256 core card, makes sense. The 6900 XT is around 23 TF of raw performance; I think actual raw TF perf of 10th-gen systems might be a bit less than that, but effective performance will be a lot higher due to architectural improvements and moving to smaller node processes along with newer die-stacking manufacturing techniques. So I gave a ~ 35 - 40 TF target for 10th-gen systems and that still roughly fits into the 2x 6900 XT expectation you have, if not necessarily in terms of pure TF numbers than definitely in effective performance (where again, IMO they'll go well beyond 2x 6900 XT).

It is true that Apple will be able to control pricing better with moving their chip production in-house but, then again, c'mon it's Apple xD. They sell $1,000 stands for their monitors, that's the kind of customer base they've cultivated. Lower production costs will most likely NOT translate into lower MSRPs for Apple products, sadly. Bringing up the Air M1 is interesting though because one of the reasons it's more price-competitive is due to increased competition from, among other companies, Microsoft, and their Surface products. But we'll see how much Apple feels to translate that to the discrete GPU market.

It's not really about 80 TF being a hard number to reach, it's more about at what cost will that come in terms of die size, production costs, thermal & cooling costs, and weighing the benefits of generic compute vs dedicated hardware acceleration. There's also the question of diminishing returns; we're already getting near photorealistic graphics with PS5 and Series X in real-time, and the gen's only started. Are we really going to need 8x their performance in terms of raw TF in order to genuinely reach photorealism? I personally don't think so. I think the areas of automating data asset generation (through AI programming model training on stuff like GPT-3) along with improving pool capacity of byte-addressable data and the data pipeline/locality efficiency factor more in that case, hence why I think 10th-gen systems will probably prioritize that. Or at least one of them will.

And again, the question of diminishing returns, makes me wonder if simply pushing yet more powerful consoles is even going to fly in seven years' time. That's where I'm thinking some standardized focus on VR & AR comes into the picture and factors into the system design. Yes there's a (very small IMO) risk of pulling a Kinect 2, but we're already getting pretty cheap VR headsets at good refresh and resolution rates, that will only continue to improve and if we can eventually get pretty good, wireless-capable VR/AR headsets at the price of a 1P controller (or only slightly more) and with the production costs to match then there's no reason not to standardize VR/AR with 10th-gen systems. That could have a perceptibly much bigger impact with the masses and even hardcore/core gamers because I think it's only with standardizing VR/AR in mainstream consoles that you'll get a regular, serious flow of AAA 1P games genuinely focusing core aspects of their game design around the tech.

I wouldn't say those are the only major benefits of NVRAM; again I have to bring up the magnitudes better P/E cycle ratings vs. NAND, significantly lower latencies, and byte-addressability for reads & writes. All that while providing energy savings; the non-volatility honestly takes a backseat in that regard because if you talk in terms of cold storage we might as well use the SSDs for that. Technically speaking, you're right; NVRAM will never have the bandwidth of DRAM. But, that's not really the point either IMHO. Some configurations of Intel's Optane DC Persistent Memory with dual-socket server units and six-channel setups can provide up to 40 GB/s bandwidth on read operations. We won't see SSDs based on NAND hitting that for the next several years, if even this decade. And, that is IIRC Gen 1 Optane DC Persistent Memory; Intel will surely be improving that for newer designs (even if those are not aimed for the consumer market, sadly).

I bring that up because if you compare the bandwidth 40 GB/s is already more than a single DDR4 3200 MHz module; if they keep improving the tech over the next several years hitting 75 GB/s or even higher is not out of the realm of possibility. Only big issue is the DIMM interface; kind of a waste for the performance you get even if the capacities are much better than DRAM. Moving the interface to something like PCIe 5.0 or 6.0 with CXL layered into it would probably be a better suit for the tech, if either Intel, Micron or other companies investing in NVRAM (Everspin maybe?) delve deeper in that area.

The question is how MUCH higher will memory bandwidths go? I've done some calculations for future DDR, GDDR, HBM memories just taking a look at gains between previous generations for those memories, and I don't think you can reach something at a price targeting a console design (let alone PCB real estate, thermals etc.) than 1.5 TB/s to 1.7 TB/s, and that's with other memory sub-systems (aside from caches) factored in. I mean, those are still really good bandwidth increases over PS5 & Series X, but would it really be enough is my question.

Going by your hypothetical, let's take Series X and say the bandwidth increases to 1.2 TB/s, there's no NVRAM, we give it 64 GB HBM3 or so, and it's got a 16 GB/s SSD. But, TF performance is now 80 TF. You're only averaging 15 GB/s per TF; that's going to be a massive hit to any operations requiring good bandwidth throughput, and that's also with needing to remember this would be a hUMA design. Do you rely on on-chip cache, then? What about capacities for the cache, how big or small will they be? Because if there's a cache miss, the penalty for going into the HBM3 will be absolutely massive with that type of setup, IMO.

** = Or going more with a 40 TF design in that same hypothetical example, you get 30 GB/s per TF; quite better but not that much better vs. high-end dedicated GPU cards out today, and again there's bandwidth contention due to the hUMA design that those cards don't deal with. Cache miss penalty cost is reduced but ratio-wise it's still lower than the current-gen PS5 and Series X, so some additional emphasis on cache sizes would have to come into play I think.

To be very honest, there's like a 2% chance of NVRAM showing up. They could simply increase the size of the SSD and still use HBM to attain at least 2TB/s memory bandwidth. You'd get lower latency than adding an extra layer in the memory hierarchy, rewriting the OS to support, dev tools... In sum, you'd not want to add NVRAM unless it was dirt cheap for large storage sizes. You want to increase the size of RAM while also increasing memory bandwidth. So you're looking at at least 16 core CPU, GPU with RT and ML accelerators, 32GB of HBM RAM minimum 2TB/s of memory bandwidth minimum. 1-2TB fast SSD min 12GB/s with an expansion slot!

thicc_gaf · Feb 22, 2021

rntongo said:
To be very honest, there's like a 2% chance of NVRAM showing up. They could simply increase the size of the SSD and still use HBM to attain at least 2TB/s memory bandwidth. You'd get lower latency than adding an extra layer in the memory hierarchy, rewriting the OS to support, dev tools... In sum, you'd not want to add NVRAM unless it was dirt cheap for large storage sizes. You want to increase the size of RAM while also increasing memory bandwidth. So you're looking at at least 16 core CPU, GPU with RT and ML accelerators, 32GB of HBM RAM minimum 2TB/s of memory bandwidth minimum. 1-2TB fast SSD min 12GB/s with an expansion slot!

I think that's pretty fair tbh; maybe with their growing purpose of unifying Xbox designs for other markets like the cloud, Microsoft could be more in position to go with some type of Optane/X100 type NVRAM support in their next-gen system, while I can see Sony doing more something to what you're saying, focusing on somewhat more RAM and push that bandwidth along with SSD decompression bandwidth.

Both will focus a lot on accelerating various parts of their processors through hardware accelerators, though this could possibly be something Sony puts a bigger focus on and Microsoft going for a bit more raw power. I actually have some interesting ideas on what MS might go with 10th-gen but some for Sony too, though I'm considering writing up something else with Sony's design that plays less on NVRAM. Maybe something like this:

[SONY]

>48 GB HBM4 (I can explain this stuff later when I post up a more detailed speculation), 3072-bit bus, 2.3 TB/s

> ~ 35 TF, various dedicated hardware acceleration for AI, ML, IU, physics etc., including possibly some extended FPGA slice integration

>VR hardware support included as standard (can go into this more too, later)

>2 TB SSD, 16 GB/s raw bandwidth, hardware decompression limit of 64 GB/s. Some type of last-level unifying cache shared between I/O block and GPU
around 512 MB - 1 GB in size, something around Infinity Cache in performance (not quite the same type of SRAM tho/lower performing vs. IC but larger capacity) of around 4 TB/s minimum.

[MICROSOFT]

>32 GB of "various memories" (will have to go into detail later down line)

>~1.7 TB/s total system memory (including non-RAM, excluding caches obviously)

>128 - 192 GB of some type of Intel/Micron-based Optane/X100 DRAM-style NVRAM (interfaced over PCIe 6.0 with CXL for cache coherency, lower latency,
etc).

>Between 34 TF - 44 TF; lesser focus on highly specific AI/ML/IU/physical hardware acceleration (would still support a lot of the more "universal" standards for RDNA whatever those are by that time).

>2 TB SSD capacity, 12 GB/s - 16 GB/s

>New data management engine block for orchestrating read/writes through RAM, SSD and NVRAM pools.

These are just some ideas, admittedly a bit changed from some parts of what I've been writing up in my spare time. You'd get systems relatively capable of many of the same things, just through some different means, and suiting specific strengths and business models of the respective parent companies.

rntongo · Feb 22, 2021

thicc_gaf said:
I think that's pretty fair tbh; maybe with their growing purpose of unifying Xbox designs for other markets like the cloud, Microsoft could be more in position to go with some type of Optane/X100 type NVRAM support in their next-gen system, while I can see Sony doing more something to what you're saying, focusing on somewhat more RAM and push that bandwidth along with SSD decompression bandwidth.

Both will focus a lot on accelerating various parts of their processors through hardware accelerators, though this could possibly be something Sony puts a bigger focus on and Microsoft going for a bit more raw power. I actually have some interesting ideas on what MS might go with 10th-gen but some for Sony too, though I'm considering writing up something else with Sony's design that plays less on NVRAM. Maybe something like this:

[SONY]

>48 GB HBM4 (I can explain this stuff later when I post up a more detailed speculation), 3072-bit bus, 2.3 TB/s

> ~ 35 TF, various dedicated hardware acceleration for AI, ML, IU, physics etc., including possibly some extended FPGA slice integration

>VR hardware support included as standard (can go into this more too, later)

>2 TB SSD, 16 GB/s raw bandwidth, hardware decompression limit of 64 GB/s. Some type of last-level unifying cache shared between I/O block and GPU
around 512 MB - 1 GB in size, something around Infinity Cache in performance (not quite the same type of SRAM tho/lower performing vs. IC but larger capacity) of around 4 TB/s minimum.

[MICROSOFT]

>32 GB of "various memories" (will have to go into detail later down line)

>~1.7 TB/s total system memory (including non-RAM, excluding caches obviously)

>128 - 192 GB of some type of Intel/Micron-based Optane/X100 DRAM-style NVRAM (interfaced over PCIe 6.0 with CXL for cache coherency, lower latency,
etc).

>Between 34 TF - 44 TF; lesser focus on highly specific AI/ML/IU/physical hardware acceleration (would still support a lot of the more "universal" standards for RDNA whatever those are by that time).

>2 TB SSD capacity, 12 GB/s - 16 GB/s

>New data management engine block for orchestrating read/writes through RAM, SSD and NVRAM pools.

These are just some ideas, admittedly a bit changed from some parts of what I've been writing up in my spare time. You'd get systems relatively capable of many of the same things, just through some different means, and suiting specific strengths and business models of the respective parent companies.

Interesting lets wait and see the outcome otherwise I think for the most part you're spot on. My only departure is with the TF numbers and the NVRAM. 80TF seems more realistic imo. 4x improvement in GPU performance over 6 years would be terrible considering the consoles launched with RTX 3090 and 6900 XT which have similar performance to 4x in terms of tflops. 6-8x increase in tflops is my guess. By 2026 an RTX 3090 should be a mid tier card, not the equivalent of the next gen console unless if GPU architects hit a wall(which is not the case).

I think the biggest thing is going to be improvements in developer tools to reduce the time it takes to develop games. Starting this generation moving to the 10th. Gaming subscription services will need this in order to deliver consistent AAA content. Thats also why I think adding NVRAM may complicate things. You want the same memory subsystem so developers don't have to refactor their code or spend time trying to utilize that new memory layer. So transitioning from 9th gen to 10th gen should be smoother than ever before. Cross gen games developed in the last years of the 9th gen should have much better performance on the 10th gen systems(higher fps, better texture quality, higher resolutions!). So adding a memory layer that won't be utilized during the first year or two wouldn't be the most economical thing.

thicc_gaf · Feb 22, 2021

rntongo said:
Interesting lets wait and see the outcome otherwise I think for the most part you're spot on. My only departure is with the TF numbers and the NVRAM. 80TF seems more realistic imo. 4x improvement in GPU performance over 6 years would be terrible considering the consoles launched with RTX 3090 and 6900 XT which have similar performance to 4x in terms of tflops. 6-8x increase in tflops is my guess. By 2026 an RTX 3090 should be a mid tier card, not the equivalent of the next gen console unless if GPU architects hit a wall(which is not the case).

I get what you're saying TBH, but the only reason I'm thinking the TF will come within the ranges I mentioned earlier is because I took a look at the top-end cards around the time of PS4 and XBO's releases, gauged where the raw TF differential was, gauged where the Pro and One X fell TF-wise to that card (780 Ti), then saw where in terms of TF Series X and PS5 are over 780 Ti. So assuming the 10th-gen will be roughly 2027 launches, I simply repeated that whole process but this time used a 3090. Maybe 10th-gen will be a tad over those ratios, but probably not by very much, so I'm personally not expecting 80 TF systems and I don't know how they could even afford enough memory to feed that type of system in a console form even by 2027 :S.

In any case, even if the raw TF doesn't hit 80 TF I think in terms of general effective performance 10th gen would easily perform near and even well above 80 TF of RDNA 2 equivalent GPUs assuming AMD can keep making consistent gains gen-over-gen in their GPU architectures. Combining that with dedicated hardware accelerators, and you've got incredibly capable systems even if the raw TF numbers "look" relatively small to whatever high-end GPUs would be on the market by the time of their launch.

I think the biggest thing is going to be improvements in developer tools to reduce the time it takes to develop games. Starting this generation moving to the 10th. Gaming subscription services will need this in order to deliver consistent AAA content. Thats also why I think adding NVRAM may complicate things. You want the same memory subsystem so developers don't have to refactor their code or spend time trying to utilize that new memory layer.

This is actually a great point and there is definitely a learning curve to adding a new layer to the memory sub-system. There is the Memory Direct functionality of Optane memory which is there to ease this type of transition, FWIW, but it's not necessarily the optimal way to use that memory. That all said, I think speeding up dev times for AAA games will (hopefully) come from major advancements in offloaded AI programming models and systems. Leveraging technologies like GPT-3 (IIRC Microsoft have either purchased this tech or have invested a lot into it) that can be programmed to generate bulk data assets, and having specialists who can curate that content with a human touch.

There's some ethical concerns in this WRT possible workplace displacement, but hopefully regulations would be able to take care of that sort of thing, ensure enough people are still landing jobs and aren't getting squeezed out by AI machine models. You'd need a delicate balance no matter what.

So transitioning from 9th gen to 10th gen should be smoother than ever before. Cross gen games developed in the last years of the 9th gen should have much better performance on the 10th gen systems(higher fps, better texture quality, higher resolutions!). So adding a memory layer that won't be utilized during the first year or two wouldn't be the most economical thing.

Yeah, there is an aspect of back-compat that should probably be considered here and which NVRAM could overcomplicate in some way. So with mentioning of these things, I think you're probably right that NVRAM won't be a massive focus, but maybe it works as a means of streamlining storage instead?

The idea is that you put a small bit of NVRAM as cache in the system to make up for cooling down on high-priced NAND when the NVRAM would have better latency and endurance figures anyway. This way they can standardize with more "normal" SSDs in terms of bandwidths, but that would just be in terms of the NAND, and they can focus on going with much bigger SSDs (4 TB, maybe even 6 TB or 8 TB capacities), and you get say 32 GB of NVRAM providing 32 GB/s of bandwidth. The typical decompressor we see right now in the current systems becomes something a bit more like a DPU-lite, handling transfer of data between the NVRAM and RAM pools (and maybe, some way of providing a pathway for the audio to the NVRAM; audio doesn't need a ton of bandwidth and this could free up RAM bandwidth of contention for just the GPU and CPU; perhaps even CPU could have a shared access to NVRAM through bus arbitration), so you can get RAM occupied in 1 second, potentially even less depending on the bandwidth of the NVRAM.

So the role of the NVRAM here would be to speed up storage data transfer speeds, massively reduce latencies, and free up some bandwidth content on the main system RAM if other processor components like audio I/O or even the CPU may need to leverage only a slice of that bandwidth for certain data calculations. Only other thing is that such a managing block (this kind of pseudo DPU-lite) needs cache coherence management; I think this type of setup wouldn't be terribly much work for Sony or Microsoft to do, it would basically be extending on work they've already done with their decompression I/O in the PS5 and Series X, it's just shifting the focus from NAND to NVRAM in terms of the memory interface and adding some ringbus functionality so multiple processor components can access NVRAM and RAM simultaneously while data coherency is maintained. It's the NVRAM being used basically in a way similar to Memory Direct mode in Optane on PC, devs would not need to explicitly program against it but they can use it as a second RAM pool that's virtually managed as a unified pool by the OS, if needed.

Meanwhile they can move to interfacing the SSDs as generic drives over some NVME Gen 5/Gen 6 as PCIe 5.0/6.0 with CXL layered/integrated into it, directly interfacing with the internal NVRAM/flash memory controller/decompressor of the system that has the NVRAM cache (say 32 GB minimum). I think this is a really plausible and relatively simple way of integrating NVRAM into 10th-gen designs that keeps costs manageable and allows them to go with a ton of storage (storage being one of the chief weaknesses of the 9th-gen systems, particularly PS5 and Series S).

EDIT: FWIW, in terms of NVRAM complicating the memory sub-system, in a way we've already seen that Sony and Microsoft have no issues in doing this if they feel it's worth the cost, as they've had to redesign major parts of the current memory sub-systems for the SSDs. If NAND technology starts to plateau or stagnate in terms of performance gains going forward, and prices don't drop terribly lower, NVRAM starts to look a lot more attractive as something to integrate into the memory sub-system IMHO.

rntongo · Feb 23, 2021

thicc_gaf said:
I get what you're saying TBH, but the only reason I'm thinking the TF will come within the ranges I mentioned earlier is because I took a look at the top-end cards around the time of PS4 and XBO's releases, gauged where the raw TF differential was, gauged where the Pro and One X fell TF-wise to that card (780 Ti), then saw where in terms of TF Series X and PS5 are over 780 Ti. So assuming the 10th-gen will be roughly 2027 launches, I simply repeated that whole process but this time used a 3090. Maybe 10th-gen will be a tad over those ratios, but probably not by very much, so I'm personally not expecting 80 TF systems and I don't know how they could even afford enough memory to feed that type of system in a console form even by 2027 :S.

In any case, even if the raw TF doesn't hit 80 TF I think in terms of general effective performance 10th gen would easily perform near and even well above 80 TF of RDNA 2 equivalent GPUs assuming AMD can keep making consistent gains gen-over-gen in their GPU architectures. Combining that with dedicated hardware accelerators, and you've got incredibly capable systems even if the raw TF numbers "look" relatively small to whatever high-end GPUs would be on the market by the time of their launch.

Agree, thats why my guesstimate is at 6-8x the current tflops. But they'll definitely be better than the RTX 3090. At least double the performance.

thicc_gaf said:
Yeah, there is an aspect of back-compat that should probably be considered here and which NVRAM could overcomplicate in some way. So with mentioning of these things, I think you're probably right that NVRAM won't be a massive focus, but maybe it works as a means of streamlining storage instead?

The idea is that you put a small bit of NVRAM as cache in the system to make up for cooling down on high-priced NAND when the NVRAM would have better latency and endurance figures anyway. This way they can standardize with more "normal" SSDs in terms of bandwidths, but that would just be in terms of the NAND, and they can focus on going with much bigger SSDs (4 TB, maybe even 6 TB or 8 TB capacities), and you get say 32 GB of NVRAM providing 32 GB/s of bandwidth. The typical decompressor we see right now in the current systems becomes something a bit more like a DPU-lite, handling transfer of data between the NVRAM and RAM pools (and maybe, some way of providing a pathway for the audio to the NVRAM; audio doesn't need a ton of bandwidth and this could free up RAM bandwidth of contention for just the GPU and CPU; perhaps even CPU could have a shared access to NVRAM through bus arbitration), so you can get RAM occupied in 1 second, potentially even less depending on the bandwidth of the NVRAM.

So the role of the NVRAM here would be to speed up storage data transfer speeds, massively reduce latencies, and free up some bandwidth content on the main system RAM if other processor components like audio I/O or even the CPU may need to leverage only a slice of that bandwidth for certain data calculations. Only other thing is that such a managing block (this kind of pseudo DPU-lite) needs cache coherence management; I think this type of setup wouldn't be terribly much work for Sony or Microsoft to do, it would basically be extending on work they've already done with their decompression I/O in the PS5 and Series X, it's just shifting the focus from NAND to NVRAM in terms of the memory interface and adding some ringbus functionality so multiple processor components can access NVRAM and RAM simultaneously while data coherency is maintained. It's the NVRAM being used basically in a way similar to Memory Direct mode in Optane on PC, devs would not need to explicitly program against it but they can use it as a second RAM pool that's virtually managed as a unified pool by the OS, if needed.

Meanwhile they can move to interfacing the SSDs as generic drives over some NVME Gen 5/Gen 6 as PCIe 5.0/6.0 with CXL layered/integrated into it, directly interfacing with the internal NVRAM/flash memory controller/decompressor of the system that has the NVRAM cache (say 32 GB minimum). I think this is a really plausible and relatively simple way of integrating NVRAM into 10th-gen designs that keeps costs manageable and allows them to go with a ton of storage (storage being one of the chief weaknesses of the 9th-gen systems, particularly PS5 and Series S).

EDIT: FWIW, in terms of NVRAM complicating the memory sub-system, in a way we've already seen that Sony and Microsoft have no issues in doing this if they feel it's worth the cost, as they've had to redesign major parts of the current memory sub-systems for the SSDs. If NAND technology starts to plateau or stagnate in terms of performance gains going forward, and prices don't drop terribly lower, NVRAM starts to look a lot more attractive as something to integrate into the memory sub-system IMHO.

You'll be increasing latency by adding an extra layer. And you'd have to create a new virtual address space and train developers in using that new cache. That extra cache here and there thats added to boost the SSD can be abstracted away from the developer but you don't get that option when adding a layer of "NVRAM" between the SSD and RAM (at least not enough to justify adding it in the first place). And basically the biggest bottleneck for next gen is no longer going to be I/O related but probably memory bandwidth. So most of the investments will go into that as well as the processors. 12-16GB/s sequential read write SSD would be sufficient for 10th gen. In 7 years maybe even more.

Not too sure that 32GB of NVRAM between the SSD and RAM would be beneficial in that case. It would need to be substantially larger than the size of RAM and with substantially higher throughput than the SSD. It would need a combination of those two in a system with only 32GB of RAM in order to justify the extra cost. Unless its dirt cheap in 7 years.

Sony PS6, Microsoft neXt Series - 10th gen console speculation [2020]

rntongo

Tkumpathenurple

rntongo

tunafish

AzBat

Agent of the Bat

Deleted member 11852

Guest

rntongo

thicc_gaf

rntongo

thicc_gaf

thicc_gaf

rntongo

thicc_gaf

rntongo

thicc_gaf

rntongo

thicc_gaf

rntongo

thicc_gaf

rntongo

Similar threads