Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
I gave this some thought, and taking in some of the discussion from Shifty, Fox, Brit, some of the other B3D members talking about ML/AI and Alex, I actually think Alex is right here.
Here me out, I only have 1 axiom this relies on:
The best possible texture that you can possibly see is dependent on your display resolution
- in this case that target is at best likely 4K native.

But in the world of dynamic scalers and dynamic resolution, checkerboarding, TAA, VRS there is likely not going to be many titles at native 4K.
So immediately having textures running at that level and trying to MIP them at 4K is a waste of resources especially at distance, you won't see it.

And then, comes the interesting point of bringing ray tracing into the equation. If we are ray tracing the base resolution will probably be around 1080p up to 1440p maximum. High resolution textures won't matter there either.
And the only option is to upscale from 1080p to 4K, in which we would likely use something like DLSS. So we don't need the hard drive to hold textures greater than 1080p because DLSS is the one creating the textures really.

I will be surprised if ray tracing is used heavily if performance is lower than RTX. They'd have to do much better than RTX for console gamers to accept the compromise to resolution. As for DLSS, maybe they can do it, but I'd like to see some performance metrics on dlss 2.0 before I really think it's viable. These consoles don't have tensor cores to chew through that.

I think with textures, there's massive room for improvement over current gen. I'm hoping AA solutions get better, and not worse, and texture resolution can start to mean more again. People do seem to love 30fps depth-of-field motion-blurred bullshit, so I could be wrong.

But we get back to what's on the ssd. I'm skeptical of the "unload everything behind you" approach, but they can make huge gains just being able to increase the density of the "chunks" in a game engine that's design around streaming. The behind closed doors Spiderman demo is really a good use case. Traverse faster, more cars on the street, more people on the sidewalks, remove load segments etc. If they just do that it's a huge improvement. I think you could probably procedurally generate cars for a world like that and store them on the ssd. Or you could animate the world like Doom Eternal, store batches of pre-recorded animation on disc to make things look alive when they don't have to be done in real-time.

The solution will probably be a balance of both, where they find the use cases that can handle the latency to the ssd (has to be more latency than RAM), and keep things that are more latency sensitive in memory. Maybe you always keep the lowest lods, mips in memory, but you stream in higher lods and swap. People might not notice if the swap happens in one or two frames, unless it's ... 30 fps lol.
 
33:12 is the context. So how am I mistaken again?

You're mistaken that it had nothing to do with clocks. He even mentions, in a separate section to the one I linked, how they would have had to clock the CPU lower due to the occasions when the new 256-bit AVX instructions were used causing the CPU to draw a lot more power. Fans not ramping out of control is an effect of the new system they are using. It is not the reason that system was developed.
 
And how much advantage does XBX have over RX580 in terms of BW (that it still needs to share with CPU)?
As explained above, the X1X has 320GB/s while the RX 580 has 256GB/s, that's a difference of 70GB/s to keep up with RX 580 on a much weaker CPU (Jaguar). Zen2 will need more bandwidth difference than 70GB/s to enable Series X to cope with comparable PC GPUs, significantly more than 70GB/s in my reckoning.

And again, it gets even worse once you factor in the 320GB/s part of the memory pool.
 
Last edited:
No. It does not.
Turing has full speed parallel integer pipeline. That's used at on average 36% with the FP pipeline.
You could look at it as if Turing cards have +36% to their FLOPS (in real games, on paper they have 2x flops, because of that pipeline).
How do you get FLOPS from INTs, though? By definition, FLOPS are floats and INTS are integer.

First tests show XSX competing with a 2080 (DF), i think that's where the raw performance is, without taking console optimization into account etc. It makes sense to me that it would land about there. Future games are going to take more advantage of RDNA2, Turing, Ampere etc, instead of being optimized for 2013 architectures. So performance won't get worse to say the least.
If you are talking about the gears benchmark, I'm not sure I would draw any broad performance metrics from that simply because it runs so well on everything. A Vega 56 can hit 4k 30FPS (console framerates!) on ultra.

AVX 128 yes, Cerny said AVX 256, hardly used in gaming.
On pc? That's because AMD CPUs in the desktop space basically didn't support AVX256 until Zen. AMD APUs released after 2015 or so included support for them, but those were slower parts, not meant for gaming. So unless you wanted to make games only for Intel CPUs, you stuck with AVX128 or avoided them completely. It wasn't until about a year ago where a large number of PC games required AVX instructions at all. There are a bunch of youtube channels where guys bench modern games on older rigs, and I saw a couple of them finally upgrade from Phenom 2's within the last year because enough games just don't launch on them.

The thing with consoles, though, is if you are trying to squeeze every bit of performance out of a box made 4 years ago, you will leverage the hardware that's there. If AVX256 has a performance benefit developers will use them.

I could be wrong here but I thought XSX had a different pool of memory
specifically set aside for the CPU ?
I'm not as knowledgeable on how a system should work but wouldn't that 560 Gb/s of Bandwidth be less
constrained if there's a path set at the hardware or software level specifically for the GPU & CPU?
As in you're more likely to hit your 560 Gb/s performance target because there's another path altogether set up for the CPU. Am I correct in assuming that ?

And if that is indeed the case wouldn't that put XSX above a 2080s, and not potentially at it's level
or lower as far as bandwidth is concerned?

I hope someone can clear this up for me !
Memory buses are described by a bus width, in bits. In the case of Xbox Series X, its 320 bits. But it's really ten 32bit connections to each of the 10 memory chips. Those 32bits multiplied by the 10 connections gives you the 320 bit bus. Each of the memory chips can transfer 56 GB per second, so reading from all chips gives you 560 GB/s of total bandwidth (10 connections at 56GB each). Here's were things get complicated, though. Not all of the memory chips are the same size. There are 1GB and 2GB chips, all with the same speed connection. Six 2GB chips and four 1GB chips for a total of 16GB.

So if you are reading all 10 chips at the same time, you can get 560GB/s, but if you only need data stored on those 2GB chips, you are limited to 336 GB/s. There isn't a dedicated path for the CPU or the GPU, but you can store less used data in a way that it only resides on those 2GB chips, so frequently used data can take advantage of the full bus speed most of the time.
 
The solution will probably be a balance of both, where they find the use cases that can handle the latency to the ssd (has to be more latency than RAM), and keep things that are more latency sensitive in memory. Maybe you always keep the lowest lods, mips in memory, but you stream in higher lods and swap. People might not notice if the swap happens in one or two frames, unless it's ... 30 fps lol.

It’s not just ram bandwidth compared to the ssd but also the latency which seems to get lost in the conversation.
 
Sorry, I must have miss calculated, it's probably around 30% if we assume about 60GB/s going to the CPU for both consoles.
(448-60) / (560 - 60) = 0.776, so 23% reduction in bandwidth. Is that 60 GB/s based on any reference, or an arbitrary figure?

Incidentally, "XBox Series X" is crap for internet searches! Search PS5 and you get PlayStation 5, but looking for the Xbox is unnecessarily awkward. "xb series x" seems the shortest that actually works?

Edit: Also, if the CPU accesses the RAM at 336 GB/s, is that 60/336 == 17.8% of the RAM bandwidth, leaving 82.2% left? So then the GPU accessing 82.2% of 560 would be 460 GB/s available to XB1 GPU, versus 388 for PS5, making only a 15% difference in GPU BW. If we think it terms of time occupying the bus, this must be how it works. :???:
 
Last edited:
Incidentally, "XBox Series X" is crap for internet searches! Search PS5 and you get PlayStation 5, but looking for the Xbox is unnecessarily awkward. "xb series x" seems the shortest that actually works?

The naming convention is PS5's biggest advantage. Lol.
 
Not sure what happened to the post that misquoted who said what, and I couldn't find original person who said what was quoted, so it had to be removed.
 
Incidentally, "XBox Series X" is crap for internet searches! Search PS5 and you get PlayStation 5, but looking for the Xbox is unnecessarily awkward. "xb series x" seems the shortest that actually works?

Compounded by people also calling it SeX!
All because MS didn’t want to appear behind Sony with Xbox 2 (aka 360). Sad really, like you say, it makes finding anything Xbox related a PITA
 
Compounded by people also calling it SeX!
All because MS didn’t want to appear behind Sony with Xbox 2 (aka 360). Sad really, like you say, it makes finding anything Xbox related a PITA

They really should just take a page out of their own playbook with the Windows 10 thing. Call it Xbox 5 and be done with it.
 
I think with textures, there's massive room for improvement over current gen. I'm hoping AA solutions get better, and not worse, and texture resolution can start to mean more again. People do seem to love 30fps depth-of-field motion-blurred bullshit, so I could be wrong.

but they can make huge gains just being able to increase the density of the "chunks" in a game engine that's design around streaming. The behind closed doors Spiderman demo is really a good use case. Traverse faster, more cars on the street, more people on the sidewalks, remove load segments etc. If they just do that it's a huge improvement. I think you could probably procedurally generate cars for a world like that and store them on the ssd. Or you could animate the world like Doom Eternal, store batches of pre-recorded animation on disc to make things look alive when they don't have to be done in real-time.
Right, I get this entirely but in my mind it comes with graphical limitations though. You can't render more than your GPU can support either. So to render a denser world still has costs to CPU, GPU and bandwidth. To render higher quality textures require higher resolutions and higher quality meshes. To render far detailed vistas with insanely far draw distances all still have major costs on the system performance. If it weren't so, we couldn't drop our graphics details to nil and achieve more FPS.

When I think about the use cases in which the SSD speed of PS5 brings substantial gains, over existing SSD solutions, I am trying to look at cases where it isn't tied in with other items. Or put another way; when the argument that 4Pro and PS5 have better fillrate and rasterization than X1X and XSX because of clockspeed. but we see in reality they don't because fillrate is tied in with bandwidth. The consoles are bandwidth bound than ROP bound. So when I think about the SSD speeds in general, I'm looking at the system and asking at what particular I/O speed are they I/O limited or system performance limited. If it's I/O limited PS5 should produce better graphics than XSX. That means all this time graphics have been held up by I/O.

If it's performance limited XSX will always produce better graphics than PS5.
 
It might or it might not be. They're Zen 2 cores, but neither MS nor Sony said anything about how much they have L3-cache, which is the one thing that separates Zen 2 -cores used in APUs from those used in CPUs

Here is it 8MB of L3, there is a big chance this the CPU of the two next generation consoles.
 
Here is it 8MB of L3, there is a big chance this the CPU of the two next generation consoles.
They don't have to adhere to any specific model (and in fact the clocks already confirm they don't on that part), it's not like they could cut / paste it from Renoir to console APUs
 
They don't have to adhere to any specific model (and in fact the clocks already confirm they don't on that part), it's not like they could cut / paste it from Renoir to console APUs

I know but out of the clock I will not be surprise if it is the exact same cache configuration for example TDP seems ok for a console.

At the end in PS4 and XB1 it was a specific configuration of an existing Jaguar CPU not found somewhere else

EDIT: If you prefer the base used for PS5 and XSX CPU...
 
I know but out of the clock I will not be surprise if it is the exact same cache configuration for example TDP seems ok for a console.

At the end in PS4 and XB1 it was a specific configuration of an existing Jaguar CPU not found somewhere else

EDIT: If you prefer the base used for PS5 and XSX CPU...
That's like looking at a PC GPU and drawing conclusion for the performance of a console. These are custom parts and don't have to adhere to any particular configuration other than to say that they are based on the same architecture. The DF article states that there are 76mb of SRAM in total on the XSX SoC. It could very well have 16mb of L3 cache.
 
Have you tried searching for "fastest console" or "most powerful console"?
:p

You can find it alright, but it's just a silly trudge. "Switch specs" are enough to pull up Nintendo Switch. 'XB360 specs' works for last gen; nice. "xb one" pulls up this gen's. Okay, "xb x specs" works for series X.

Hey, look what your suggestions throw up! :LOL:

upload_2020-3-26_14-30-23.png

upload_2020-3-26_14-29-48.png
 
Status
Not open for further replies.
Back
Top