Predict: Next gen console tech (10th generation edition) [2028+]

Why would next consoles have 32 GBs RAM if PC gets away without it? That's a lot of additional cost. If there's a RAM bump, I'd guess only to 24 GBs. I think it's more important to spend on BW, so I'd like to see stacked RAM. Would definitely take a 16 GB RAM console with HBM over a 32 GB console.
I'm curious if fully unified memory might go away next generation.

The transition to GDDR7 should get us to 1024GB/s of main memory bandwidth even if they just stick with 16GB on a 256 bit bus. That's already nothing to be sniffed at, and should be an element which facilitates easy B/C, but might a relatively small pool of additional HBM be the move?

We've seen IC be of sizeable utility in RDNA2+ GPU's and RDNA3 has taken baby steps towards chiplets, so maybe there's scope for a little stack of HBM. Slower than SRAM cache, but far cheaper per MB, and sufficient for BVH structures while alleviating bandwidth constraints from the main memory.

And at the risk of becoming unfocussed in this post, I think there needs to be something unique about the next generation other than existing features simply improving (including ray tracing.)

To that end, I propose persistence, and something akin to Intel's Optane or Sony's ReRAM will facilitate this. Having games thrash the SSD with writes is risky business when you consider the number of developers, not all of whom will follow best practices. Persistent memory limits scope for SSD degradation, and allows developers to go wild writing, for example, updated world states.

So I predict the PS6 will come with 128GB of ReRAM, and this is what games will be streamed from. Save states/RAM dumps can be stored on it in 2 seconds. Games will flush it and copy over to it each time they're instanced. It will be user replaceable, so buying a new "memory card" can allow you to hold multiple instances.

It also opens up the possibility of ludicrously expensive special editions of games. I know I'd buy GTA6 on its own, dedicated ReRAM cartridge.
 
RAM does seem the last battleground for innovation outside of some bizzaroland disruptive moves. There are a fair few techs there that haven't made it into the mainstream.
 
ReRAM would be cool, but is it produced in any kind of industrial capacity? Unless it is, I can't see any console manufacturer adopting it.
 
Are current GPU architectures really considered so ideal that ram is the last bastion for innovation? I think for consoles to have a future beyond the next gen, they are going to have to abandon these generic AMD/Nvidia architectures. Probably even more so for CPUs where the performance potential is cratered by the heft of absolutely useless legacy bloat and other compliance.
 
There's a million reasons why they can't branch out into exotic novel designs.

1) None exists. They'd need to invent something better than AMD, nVidia and Intel can come up with. Something awesome like what Toshiba created for PS3 that didn't need a last-minute desperate change to get an off-the-shelf part that actually worked.
2) They'd lose all BC.
3) Ports will be problematic. You'll be the only arch that you want devs writing to while offering only a tiny fraction of the entire gaming sector. How many devs will say, "sod this," and just not bother?
4) Porting your games to PC becomes a 'mare so the easy extra revenue is no longer easy extra revenue but costly graft.
5) You'd need the middlewares to be up to speed, which means they need to be working on this new hardware now. It's too late to try and invent something new.
6) No tools, and potential paradigm shifts will slow hardware adoption by devs. HW might be amazingly capable but devs will likely struggle to make the most of it.

We've just had pages and pages of Lurkmass trying to present an argument that even RTRT hardware is redundant and can be replaced. Let's not fall back into vague notions of 'something new' without any real grounding. If there's decent evidence of something tangible, like, you know, all those Sony hardware patents like photon mapping hardware that never got made into anything, then it can make it here.

If people really want to discuss hypothetical alternative architectures to the current programmable shader paradigm, please start a new thread and seed it with some sensible proposition.
 
Is edram a no go for a fast pool of sizable low latency memory? I'd love to see this in a future Xbox or PS console so we can really let the GPU's sing. What about the thought of letting a CPU core in the APU handle the BVH stuff?
 
There's a million reasons why they can't branch out into exotic novel designs.

1) None exists. They'd need to invent something better than AMD, nVidia and Intel can come up with. Something awesome like what Toshiba created for PS3 that didn't need a last-minute desperate change to get an off-the-shelf part that actually worked.
2) They'd lose all BC.
3) Ports will be problematic. You'll be the only arch that you want devs writing to while offering only a tiny fraction of the entire gaming sector. How many devs will say, "sod this," and just not bother?
4) Porting your games to PC becomes a 'mare so the easy extra revenue is no longer easy extra revenue but costly graft.
5) You'd need the middlewares to be up to speed, which means they need to be working on this new hardware now. It's too late to try and invent something new.
6) No tools, and potential paradigm shifts will slow hardware adoption by devs. HW might be amazingly capable but devs will likely struggle to make the most of it.

We've just had pages and pages of Lurkmass trying to present an argument that even RTRT hardware is redundant and can be replaced. Let's not fall back into vague notions of 'something new' without any real grounding. If there's decent evidence of something tangible, like, you know, all those Sony hardware patents like photon mapping hardware that never got made into anything, then it can make it here.

If people really want to discuss hypothetical alternative architectures to the current programmable shader paradigm, please start a new thread and seed it with some sensible proposition.
I don’t dispute any of your points or have a proposed solution, but there is no way a PS7 can realistically exist otherwise. I’m more so thinking of fundamental changes to basic shader/pixel and geometry/raster execution rather than axing RT hardware. As an example, is quad grouping for efficient shading and rasterization still the route forward? Keep in mind Nvidia and AMD have been coasting on the same graphics and compute architectures for half a decade now.
 
Is there any evidence of this? Profiling maybe? Works quite well on Nvidia's HW.
Sure here's a fossilize dump of a Portal RTX shader compiled on AMD HW ...
*** SHADER STATS ***
Driver pipeline hash: 11176532120817558060
SGPRs: 128
VGPRs: 256
Spilled SGPRs: 2228
Spilled VGPRs: 15100
Code size: 676508
LDS size: 8192
Scratch size: 107520
Subgroups per SIMD: 4
Hash: 1758563055
Instructions: 99062
Copies: 33126
Branches: 1746
Latency: 5793375
Inverse Throughput: 2896688
VMEM Clause: 3961
SMEM Clause: 733
Pre-Sched SGPRs: 106
Pre-Sched VGPRs: 254
Over 15K(!) worth of VGPRs spilled with 2K SGPRs spilled!

It works 'well' for NV because they're using SER to spill to their large L2 cache ...
Why would it be VOPD all of a sudden? It should be the traversal stack handling and other improvements made for the SW traversal in RDNA3, including the larger register file so that it can keep more work in flight (which says nothing about spilling). Again, is the register spilling isolated here, or is it a pure guess?
This Valve contractor has an experimental branch implementing the instruction you're referring to and he finds that the performance gains are negligible. It's almost certainly the increased register file that's disproportionately boosting them in several RT benchmarks ...
I've seen such behavior only in one game Portal: Prelude RTX, where the RX 6800 XT falls off a cliff (by up to 7x per overall framerate, not your typical 1.5-2x) in comparison with the 7900 XTX with more registers. But I doubt that the RTX Remix renderer was ever intended to run well and account for the limitations of amd's hardware, as it pushes the boundaries of what's possible on nvidia's hardware. Had they added DMMs for surfaces, the difference might have been even bigger, so it's irrelevant as an example. Any other examples?
Any games that benefit from this so called 'SER' feature are practically spilling ....
According to the presentation, they allocate LDS to store/load arguments and return data for the function calls. AMD prefers inlining since it eliminates unnecessary loads and stores for them. It is unclear whether this translates to other architectures.
"Other architectures" is an irrelevant factor. Portal RTX is bad even on Intel HW to the point where doing inline RT was a win for them. How well a HW can handle spilling is highly dependent on it's memory system ...
Typically, people use the term "spilling" for pathological cases where the hardware automatically spills the registers to other buffers due to a lack of better options. This is also characterized by very poor occupation and performance. Spilling is a poor characterization of what something like SER does, as SER explicitly controls the program behavior and is not an uncontrollable catastrophic event characterized by extremely poor performance like spilling.
SER is a 'spilling' mechanism in it's most pure form. How else can they 'reshuffle' these threads if there's no space in the register file ? Just because NV made it fast in a special case (ray generation shaders) doesn't mean it is fundamentally distinct from other spilling methods ...
No, they can't, as the SDFs themselves are prebaked and unchangeable during runtime. Yes, you can move the unmerged objects' SDFs around and scale/rotate them, but this would be an extremely poor approximation of animation, resulting in even more graphical artifacts, and you can't do the same for the global merged SDF anyway.
You can absolutely deform the meshes and do screen space passes to fix up some of the self-intersection artifacts while UE5 just disables RT on WPO for static meshes by default ... (there's work under way to dynamically generate SDFs too)
This would diminish any advantages in performance even further, assuming there were any. Additionally, to approximate infinitely thin polygonal edges, you need infinite voxel resolution, which is impossible to achieve, and there is typically plenty of thin geometry in games.
@Bold So ? You need infinite raster resolution to do the same (avoid aliasing) and that goes for just about any instance when trying to encode a continuous domain into a discrete domain ...

The point of increasing SDF resolution is to reduce this information loss to approach human visual acuity as much as possible ...
 
There's a million reasons why they can't branch out into exotic novel designs.

1) None exists. They'd need to invent something better than AMD, nVidia and Intel can come up with. Something awesome like what Toshiba created for PS3 that didn't need a last-minute desperate change to get an off-the-shelf part that actually worked.
2) They'd lose all BC.
3) Ports will be problematic. You'll be the only arch that you want devs writing to while offering only a tiny fraction of the entire gaming sector. How many devs will say, "sod this," and just not bother?
4) Porting your games to PC becomes a 'mare so the easy extra revenue is no longer easy extra revenue but costly graft.
5) You'd need the middlewares to be up to speed, which means they need to be working on this new hardware now. It's too late to try and invent something new.
6) No tools, and potential paradigm shifts will slow hardware adoption by devs. HW might be amazingly capable but devs will likely struggle to make the most of it.
1) Are you sure when we take a look at Apple GPUs ? (known to be non-conformant against major public gfx APIs)
2) 'Exotic' does not necessarily mean that you absolutely HAVE to dispose prior art. Just add unique custom features to current existing architectures. Exotic 'extensions' if you will but either way I'm not opposed discarding BC if consoles can make bigger jumps ...
3 & 4) We can reduce multi-platform development costs by having developers simply focus on the "lead platform". There's no force out there where developers obligated to implement 'parity' between platforms. Parity is a privilege and NOT a rule!
5 & 6) Feature usage is driven by competition and don't need to see immediate usage at the start ...
 
1) Are you sure when we take a look at Apple GPUs ? (known to be non-conformant against major public gfx APIs)
2) 'Exotic' does not necessarily mean that you absolutely HAVE to dispose prior art. Just add unique custom features to current existing architectures. Exotic 'extensions' if you will but either way I'm not opposed discarding BC if consoles can make bigger jumps ...
3 & 4) We can reduce multi-platform development costs by having developers simply focus on the "lead platform". There's no force out there where developers obligated to implement 'parity' between platforms. Parity is a privilege and NOT a rule!
5 & 6) Feature usage is driven by competition and don't need to see immediate usage at the start ...
It's funny how opposed you are to any proprietary Nvidia architecture/extensions gaining traction or stifling "progress" which could be made elsewhere... but you're giddy when it comes to the idea of AMD based consoles doing it.

"There's no force out there where developers obligated to implement 'parity' between platforms. Parity is a privilege and NOT a rule!" Nicely said.

Want to lower mulit-platform costs where devs can focus on a "lead platform"? It's called the PC platform. It supports every form-factor. Let's also make it Nvidia based.. so devs have access to the latest and best technologies. 👍
 
According to the current benchmark (I only found 1 so far) for Strix Point, AMD may have managed to bump performance per watt 50% in 2 years (assuming the tested system was also at 22w).

Assuming this trend continues, and normalizing to a Series X being equivalent to a 2070 in Timespy:

Xbox M: 20Cu @ 1.8ghz, 4.8tf(9.6 broooo) on TSMC N2, 1080p screen, 45 TOPS NPU, 16gb 64bit bus GDDR7 @24gbps, 16mb SRAM cache, 8 Zen 6c cores, 39% faster than a Series S in Time Spy, much faster in raytracing, 450g 15w tdp, "Switch" mode @2.5ghz, $399

xXBoXx: 80Cu @2.7ghz, 28.8tf on Samsung 2nm, 100 TOPS NPU, 24gb 192bit bus @GDDR7 28gbps, 48mb SRAM cache, 8 Zen 6c cores (2 reserved for OS use) 2 Zen 6 big cores, basically a 4080 GPU performance in pretty small box, $599.

Note: In true Microsot tradition the more X's you have the more power your console has, thus the name xXBoXx!
 
And at the risk of becoming unfocussed in this post, I think there needs to be something unique about the next generation other than existing features simply improving (including ray tracing.)
I mean. AI in this space is pretty significant. There’s a lot of hype in the AI space, but it’s actually come to realization in the gaming space.

It’s hard to believe this wouldn’t be a major focus for next generation consoles. You can do so much more for much less computation with some compromises.

It’s not a feature that is currently readily available for this generation.
 
Frenetic Pony's post above made me think.

What of for next gen, the low end console is also a mobile, ala the switch?
runs @ x specs in handheld mode, and has the ability to run faster when docked.

I'm not sure the maths / performance would work out though.
I struggle to think that at release you could do even 75% of a series S in a handheld.
depends where you end up on that freq v power curve.

Also I'd like to add that some of the "exotic" technologies aren't all that exotic.
look at what AMD is doing with the M300I / M250X boards, a lot of fancy packaging and memory types going on there.
Plus infinity cache sets a precedent.

I guess perhaps people should clarify if their prediction is in the form of a standard AMD APU, similar to what we have this gen,
and the Strix point thingy, and the steam deck etc.
OR
Some more exotic design.

I'm pretty confident the PS5 Pro, will be a standard APU design with newer RDNA providing higher RT performance.
I used to think that next-gen would also be a pretty standard APU, but lately I starting to think not.
Longer generations give devs MORE time to make the best of the bespoke hardware too...
 
ReRAM would be cool, but is it produced in any kind of industrial capacity? Unless it is, I can't see any console manufacturer adopting it.
Not as far as I'm aware. But a lot can change in 5 or so years.

I mean. AI in this space is pretty significant. There’s a lot of hype in the AI space, but it’s actually come to realization in the gaming space.

It’s hard to believe this wouldn’t be a major focus for next generation consoles. You can do so much more for much less computation with some compromises.

It’s not a feature that is currently readily available for this generation.
True. I'm curious what benefits it can actually bring to games in offline game consoles though.

As far as I'm aware, all we've seen it provide so far are upscaling and interpolation solutions. Anything beyond that seems to require $100,000,000+ servers. Which aren't going to fit in a home console anytime soon haha

Frenetic Pony's post above made me think.

What of for next gen, the low end console is also a mobile, ala the switch?
runs @ x specs in handheld mode, and has the ability to run faster when docked.

I'm not sure the maths / performance would work out though.
I struggle to think that at release you could do even 75% of a series S in a handheld.
depends where you end up on that freq v power curve.

Also I'd like to add that some of the "exotic" technologies aren't all that exotic.
look at what AMD is doing with the M300I / M250X boards, a lot of fancy packaging and memory types going on there.
Plus infinity cache sets a precedent.

I guess perhaps people should clarify if their prediction is in the form of a standard AMD APU, similar to what we have this gen,
and the Strix point thingy, and the steam deck etc.
OR
Some more exotic design.

I'm pretty confident the PS5 Pro, will be a standard APU design with newer RDNA providing higher RT performance.
I used to think that next-gen would also be a pretty standard APU, but lately I starting to think not.
Longer generations give devs MORE time to make the best of the bespoke hardware too...

PS5Pro will almost certainly be an APU.

The next generation, I'd love to see chiplets. I suppose we'll have a better idea of how viable that is when RDNA4 launches.

Something like a PS6Lite consisting of PS5Pro core/CU counts, in chiplet form, on the latest version of Zen/RDNA and then a PS6 with a second CU chiplet would be pretty great to see.
 
True. I'm curious what benefits it can actually bring to games in offline game consoles though.

As far as I'm aware, all we've seen it provide so far are upscaling and interpolation solutions. Anything beyond that seems to require $100,000,000+ servers. Which aren't going to fit in a home console anytime soon haha
I'm not entirely sure we want LLMs in video games quite like that. Each game would most certainly have to train their own LLMs to match the context and setting of the game. Generic LLMs would speak like we do, I guess, as opposed to how the characters do and their knowledge in that time.

But you could see it being used in more places for graphics and physics (where it's not critical to gameplay) etc. You can also use it for AI movements and group tactics etc. But games that are too hard, most people won't want to play them. There's quite a few ways to use AI for games, but not necessarily for all games.
 
I'm not entirely sure we want LLMs in video games quite like that. Each game would most certainly have to train their own LLMs to match the context and setting of the game. Generic LLMs would speak like we do, I guess, as opposed to how the characters do and their knowledge in that time.

But you could see it being used in more places for graphics and physics (where it's not critical to gameplay) etc. You can also use it for AI movements and group tactics etc. But games that are too hard, most people won't want to play them. There's quite a few ways to use AI for games, but not necessarily for all games.
I still see most all this stuff being more useful for offline tools for devs than anything real time in the games. Use it to help produce a ton of NPC dialogue, sure. But devs will still have to select what's used and not just let it loose in the game to say whatever. Especially when we know these AI chatbots are basically built to tell you what you want to hear in many ways, so imagine it just spouting out a bunch of false information about the game to the player! lol

Use it to help refine AI pathfinding or something, sure, but again, something devs will make 'fixed' in the game. Cuz it's not just that we dont want games with superhuman difficulty, we generally also need games to have predictable AI. Something with rules and limitations that the player can learn against and that makes sense to us. Like, it's one thing to go against the 'unpredictability' of human opponents, cuz in the end, we all have similar enough minds and limitations. We aren't really that unpredictable. AI will work quite differently and in ways that wont feel fun to play against.
 
Over 15K(!) worth of VGPRs spilled with 2K SGPRs spilled!

It works 'well' for NV because they're using SER to spill to their large L2 cache ...
Without knowing the units utilization, the time period over which these metrics were collected, other statistics, and even whether this was executed on RDNA 2 or 3, those 15K don't tell me anything.

And regarding SER, again, it's not used to "spill" anything. It's here to prevent spilling, cache trashing, and divergence by sorting materials by ID prior to shading them.

It's almost certainly the increased register file that's disproportionately boosting them in several RT benchmarks ...
Well, then it simply means they need more registers with their SW traversal approach. I would not blame RT for that inefficiency if the competitor gets away with using fewer registers.

How well a HW can handle spilling is highly dependent on it's memory system ...
Other architectures might not experience spilling, and even RDNA 3 is likely not experiencing it given how well it performs compared to RDNA 2. This seems more like an architectural deficiency in that particular case. I don't see why RDNA would have less efficient spilling compared to other architectures.

How else can they 'reshuffle' these threads if there's no space in the register file ?
Threads are not reshuffled in the middle of hit shader execution. They are reshuffled before the hit shaders start. Only the state defined prior to the reshuffling and required afterward has to be saved from registers to the cache hierarchy.

You can absolutely deform the meshes and do screen space passes to fix up some of the self-intersection artifacts while UE5 just disables RT on WPO for static meshes by default ... (there's work under way to dynamically generate SDFs too)
The self intersections are typically avoided via ray biasing, though it would have been a fun exercise to dynamically adjust meshes to somehow solve this mismatch. And good luck with supporting skinning and vertex animations for the SDFs. Even the simple SDF generation would not be free, likely much more expensive than BVH generation itself.

So ? You need infinite raster resolution to do the same (avoid aliasing) and that goes for just about any instance when trying to encode a continuous domain into a discrete domain ...
Raster resolution is a different topic. For a perfect polygon match without any holes, you need the same 1 polygon in the BVH. For the same quality with SDF, you need an SDF with infinitely large resolution.

The point of increasing SDF resolution is to reduce this information loss to approach human visual acuity as much as possible ...
Yet, there is no reason why anyone would prefer the poor proxy via the SDF instead of the better matching polygonal geometry in the BVH.
 
Yeah, GDDR is expensive, probably $30 for 8gb of GDDR7. 24gb would be $90 BOM alone. And what would it be needed for? Everything is streamed from an SSD now at the high end.

One way or another the major bottleneck is memory latency/bandwidth. Compression and caches are what's going to improve performance per watt, and compression can improve performance per area as well. That and resource handling/creation, AMD's work graphs with extensions are a great start. All the RT in the world is unhelpful if the BVH is 8gb and takes several minutes each rebuild, and an 8gb BVH would be useless as you'd get bandwidth restricted instantly even if you could store it.

More ram is mostly pointless.
More VRAM is mandatory if next-gen consoles are expected to run AI models locally, which Microsoft clearly intends.
 
More VRAM is mandatory if next-gen consoles are expected to run AI models locally, which Microsoft clearly intends.

That's just large language models, which aren't going to be running in major games anytime soon (sorry dumb Nvidia tech demo). Often neural net models can compress current assets, a major advantage of turning animation data into a neural net set is the memory consumption goes down a decent bit.
 
Back
Top