Sony PS6, Microsoft neXt Series - 10th gen console speculation [2020]

Why hasn't that happened already? Multicore has been standard for two decades. If engines aren't scaling to the core-count effectively, there's a lot of potential sitting idle.
The issue is mainly software, not hardware. A developer at Ubisoft once explained why games are hard to parallelize:

"there is usually a “hot path” of CPU updates - read the input, player + camera physics and animation - that need to happen every frame. Very hard to parallelize, without extra latency / memory. Single-core performance is still important"

"It’s about update order. To be able to say where the camera is, you have to run the animation of the player. To do that, you need to do the physics step. And so on.You can make these run in parallel, if you’re OK with latency - e.g. use the physics sim from the frame before"

"Mostly an observation from the last games I worked on. The bottleneck is a single-threaded chain of updates for the player, rather than the number of entities simulated / rendered. Animation + camera updates are surprisingly expensive, due to e.g. ray casts and others"

"As an example, a camera system can require hundreds of sphere casts to not feel janky. Predicting where it will move, smoothly avoiding obstacles, etc"


NVIDIA also published a blog targeting CPU thread optimizations in games, in short: more CPU threads are often actually detrimental to performance, NVIDIA advises the games to reduce the thread count of their worker pools to be less than the core count of modern CPUs:

Many CPU-bound games actually degrade in performance when the core count increases beyond a certain point, so the benefits of the extra threading parallelism are outweighed by the overhead

On high-end desktop systems with greater than eight physical cores for example, some titles can see performance gains of up to 15% by reducing the thread count of their worker pools to be less than the core count of the CPU

Instead, a game’s thread count should be tailored to fit the workload. Light CPU workloads should use fewer threads

Executing threads on both logical cores of a single physical core (hyperthreading or simultaneous multi-threading) can add latency as both threads must share the physical resource (caches, instruction pipelines, and so on). If a critical thread is sharing a physical core, then its performance may decrease. Targeting physical core counts instead of logical core counts can help to reduce this on larger core count systems

On systems with P/E cores, work is scheduled first to physical P cores, then E cores, and then hyperthreaded logical P cores. Using fewer threads than total physical cores enables background threads, such as OS threads, to execute on the E cores without disrupting critical threads running on P cores by executing on their sibling logical cores

for chiplet-based architectures that do not have a unified L3 cache. Threads executing on different chiplets can cause high cache thrashing.

Core parking has been seen to be sensitive to high thread counts, causing issues with short bursty threads failing to trigger the heuristic to unpark cores. Having longer running, fewer threads helps the core parking algorithms.


Sebbi also talked about how engines and rendering paradigms often gets in the way of multi threading:

Let's talk about CPU scaling in games. On recent AMD CPUs, most games run better on single CCD 8 core model. 16 cores doesn't improve performance. Also Zen 5 was only 3% faster in games, while 17% faster in Linux server. Why? What can we do to make games scale?

History lesson: Xbox One / PS4 shipped with AMD Jaguar CPUs. There was two 4 core clusters with their own LLCs. Communication between these clusters was through main memory. You wanted to minimize data sharing between these clusters to minimize the memory overhead.

6 cores were available to games. 2 taken by OS in the second cluster. So game had 4+2 cores. Many games used the 4 core cluster to run your thread pool with work stealing job system. Second cluster cores did independent tasks such as audio mixing and background data streaming.

Workstation and server apps usually spawn independent process per core. There's no data sharing. This is why they scale very well to workloads that require more than 8 cores. More than one CCD. We have to design games similarly today. Code must adapt to CPU architectures.

On a two CCD system, you want to have two thread pools locked on these cores, and you want to push tasks to these thread pools in a way that minimizes the data sharing across the thread pools. This requires designing your data model and communication in a certain way.

Let's say you use a modern physics library like Jolt Physics. It uses a thread pool (or integrates to yours). You could create Jolt thread pool on the second CCD. All physics collisions, etc are done in threads which share a big LLC with each other.

Once per frame you get a list of changed objects from the physics engine. You copy transforms of changed physics engine objects to your core objects, which live in the first CCD. It's a tiny subset of all the physics data. The physics world itself will never be accessed by CCD0.

Same can be done for rendering. Rendering objects/components should be fully separated from the main game objects. This way you can start simulating the next frame while rendering tasks are still running. Important for avoiding bubbles in your CPU/GPU execution.

Many engines already separate rendering data structures fully from the main data structures. But they make a crucial mistake. They push render jobs in the same global job queue with other jobs, so they will all be distributed to all CCDs with no proper data separation.

Instead, the graphics tasks should be all scheduled to a thread pool that's core locked to a single CCD. If graphics is your heaviest CPU hog, then you could allocate physics and game logic tasks to the thread pool in the other CCD. Whatever suits your workload.

Rendering world data separation is implemented by many engines already. It practically means that you track which objects have been visually modified and bump allocate the changed data to a linear ring buffer which is read by the render update tasks when next frame render starts.

This kind of design where you fully separate your big systems has many advantages: It allows refactoring each of them separately, which makes refactoring much easier to do in big code bases in big companies. Each of these big systems also can have unique optimal data models.

In a two thread pool system, you could allocate independent background tasks such as audio mixing and background streaming to either thread pool to load balance between them. We could also do more fine grained splitting of systems, by investigating their data access patterns.

 
Now that the new AMD GPUs have come out and we have received the prices, we have more insight into what could be achieved at what price from a console with such a graphics unit. I think that if AMD can sell the 9070XT for $600, then its production cost cannot be more than $400. If you add a modern Zen4/5 laptop CPU to that, you could probably get a console with this configuration for $700 in this year or next. And if FSR4 is really as good as they say, it would be already a generational leap compared to current consoles. That would be normal.
 
Now that the new AMD GPUs have come out and we have received the prices, we have more insight into what could be achieved at what price from a console with such a graphics unit. I think that if AMD can sell the 9070XT for $600, then its production cost cannot be more than $400. If you add a modern Zen4/5 laptop CPU to that, you could probably get a console with this configuration for $700 in this year or next. And if FSR4 is really as good as they say, it would be already a generational leap compared to current consoles. That would be normal.
Would a 9070XT fit in a console power envelope and form factor?
 
Would a 9070XT fit in a console power envelope and form factor?

As configured with the announcement today with the CPU/rest of system you're probably looking at ballpark of around 50% more power consumption than the PS5 Pro. As for whether or not that is possible is probably going to have depend on what you feel the power envelope and form factor of a console is.

However it's likely the 9070XT is clocked higher than any optimal efficiency point to hit performance targets. But whether or not you a console would want that performance/cost tradeoff for more efficiency is another matter. I'd guess if you did use it 1:1 in terms of design it would be have some functional units disabled for overall yields as well (since they can't really bin off a separate SKU, or maybe some day they will who knows). If the above is the case well the 9070 is only 220W and with everything else wouldn't be much higher than the PS5 Pro.

The other thing is I would not think the next gen consoles would use GDDR6. It would likely be both worse logistically and possibly higher cost as well over the long run lifespan. It's worth noting GDDR5 for example has been more expensive than GDDR6 for quite some time now. It would also force a clamshell design and doubling to 32GB vs. the possibility of using 3GB chips instead for some intermediary.
 
Next gen Xbox is going to be a PC according to Jez!! Starts from 1:11:00


Without a devkit, mama mia this thing is going to be a pain to develop for if its like this.
 
Now that the new AMD GPUs have come out and we have received the prices, we have more insight into what could be achieved at what price from a console with such a graphics unit. I think that if AMD can sell the 9070XT for $600, then its production cost cannot be more than $400. If you add a modern Zen4/5 laptop CPU to that, you could probably get a console with this configuration for $700 in this year or next. And if FSR4 is really as good as they say, it would be already a generational leap compared to current consoles. That would be normal.
We already had a taste of it with the PS5 Pro, that showed us what such a GPU configuration - in terms of Ray Tracing and upscaling, could be squeezed into the power envelope of a console. Now we know that Sony and AMD were working so closely on FSR\PSSR and that project Amethyst is an expansion of that I guess we don't have that long to see what FSR4 type tech will look like in the console space.
 
Why would it be hard to develop for? Developers wouldn't develop for it, just like with steam deck. It would be the PC exe.
Right, would depend on the requirements from MS. Was wondering if there would be other licensed Xbox devices with different hw and how those would be supported but as well Valve does offer a devkit for the steamdeck
 
Would a 9070XT fit in a console power envelope and form factor?
According to my calculations, I can build such a PC in a case that is barely larger than the Series X, which I use standing up. From the front, it looks the same, only a little taller. There is no need to compromise for this, you can do it at home.

In this size, which is still much smaller than an average desktop PC, they could easily make such a console. By the way, consoles have also become larger recently, so this is not a problem. The power consumption, yes, need a 700W powersupply. Which, as I said, fits easily into the box, the question is whether a console manufacturer wants it.
 
Back
Top