Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
They'd still have to map it and expose it via their API(s).
Yea, there's that as well.

I'm not even sure if that is a hardware component to be honest. So perhaps it's not?
I'm not really sure when I wrote that, it some ways it does sound like an application layer type feature.
 
Yea, there's that as well.

I'm not even sure if that is a hardware component to be honest. So perhaps it's not?
I'm not really sure when I wrote that, it some ways it does sound like an application layer type feature.
nvm.. I see.
 
So I posted this on ResetERA a few times, but not here. The whole idea is that, like Cerny described in a kind of round about way, is that Devs are going to choose a Mode, like GPU or CPU Mode. In the Former for example, the GPU has Power priority and the CPU goes below that maximum clock of 3.5 GHz. This ensures a more stable GPU clock. The other way around then in CPU Mode.
The extent to which the clocks Fall out for the different priorities remain to be clarified.
This is what has been Heard from people that have the machine.
I think it will make sense that a lot of cross gen games coming from Jag will under utilise the Zen CPU anyway, so GPU Mode makes sense for them. Same for All those games that use unfixed resolutions with fine grain dynamic resolution. But I am not sure what happens if you have a GPU and CPU intensive target.

I expect bandwidth to be very important in RT Performance, just like it is on PC.

Thankyou for explaining it on this forum too! :) It's what i and some others try to say but you do that so much better.

I think PC will always be more expensive but its also a computer, not just a gaming device. But by late 2020/early 2021, we are talking about Zen 3 and Ampere. 1600 dollars will blow away the XSX and PS5 by then

Ofcourse prices will drop, dont think you will need to spend 1600 dollars, but it is going to be more expensive with pc gaming. You could wait a year or something, like late 2021, and things will look much better.

From a purely cynical perspective....Sony is able to market max potential and it's not like you are ever going to know what it is actually running at...unless someone hacks it...

This. I guess it will come to light anyway.....

I still find this off the mark. No one who's considering PC gaming vs Console gaming should ever be paying full retail price for the Online aspects of PS+ or XboxLive. The price is far closer to $40 - $45. You also get free games for that cost, anywhere from 2 (Plus) to 4 (Gold) games a month for that cost.

Gamepass is a very good deal, especially now when MS starts pumping out more aaa games. For both pc and xbox at the same time.

But as i said, by late 2020 the XSX will be mid range specs so nobody has to spend 1600 dollars to get the same performance anyway.

but realistically you could probably expect something like 2080S/XSX performance from the 2070S or if you're lucky 2070 price bracket

No idea what the XSX is going to cost, but that thing will be unbeatable on price/performance ratio, i'm a pc gamer but thats something ive to admit. A 12TF+ gpu with ray tracing at that level, 3.8ghz 8 core cpu, 1TB insane speed SSD (bcpack 6gb/s), all that in a small form factor thats going to run very quiet. Full BC all the way back to 2001, and PC/xbox cross plat games. As a pc gamer that sounds very tempting to me, i already have a pc powerfull enough that matches the xsx on most fronts aside from the ssd, but for the living room, i want an xsx. With gamepass i get to play on both so why not.
As most know, im a pc gamer but that xsx gets my attention. As a pc gamer you want the most powerfull hardware, and sustained high fps, think that is what MS seems to aim for.
 
So I posted this on ResetERA a few times, but not here. The whole idea is that, like Cerny described in a kind of round about way, is that Devs are going to choose a Mode, like GPU or CPU Mode. In the Former for example, the GPU has Power priority and the CPU goes below that maximum clock of 3.5 GHz. This ensures a more stable GPU clock. The other way around then in CPU Mode.
The extent to which the clocks Fall out for the different priorities remain to be clarified.
This is what has been Heard from people that have the machine.
I think it will make sense that a lot of cross gen games coming from Jag will under utilise the Zen CPU anyway, so GPU Mode makes sense for them. Same for All those games that use unfixed resolutions with fine grain dynamic resolution. But I am not sure what happens if you have a GPU and CPU intensive target.

I expect bandwidth to be very important in RT Performance, just like it is on PC.
Yea, that makes sense. Introducing "free form" variable frequency instead of GPU/CPU mode would likely introduce further issues next gen with BC.
 
The extent to which the clocks Fall out for the different priorities remain to be clarified.
Oh Alex, you tease.

I mean, just thinking about the physics of it. The clock/power draw can't possibly be symmetrical between GPU and CPU.
GPU by surface area is going to draw significantly more power than the CPU.
So Cerny is probably very correct in revealing that if the CPU needed more power the GPU would lose some clockrate to serve the CPU power needs.
He failed to provide how much clockrate the CPU must lose for the GPU to hit maximum clock rate, because in my mind in the other direction the CPU must lose a lot of power to up the clock rate on the GPU side of things.

The only numbers you guys provided where the maximum clock rates for both the CPU and GPU.
Which truthful but not the full story.
The numbers provided should have been CPU is X then GPU is 2230 Mhz. Or CPU is 3500 Mhz GPU is Y.

Thinking out loud, there's no way that MS could have adopted this technology. 52 CUs should flat out kill the power allotment for the CPU.
 
It was meant to represent the fact the prices aren't gonna go to freefall or anything like that. 3700X will get cheaper, but it's price won't come barrel rolling down either. For reference, looking at PricePicker for example Ryzen 7 2700X's price didn't drop after the initial ~10% drop 'till around 3 months after 3700X was out, to about 33% total drop. https://pcpartpicker.com/product/bd...core-processor-yd270xbgafbox?history_days=365
And why it's suddenly 2021, XSX (and PS5) is launching in time for holiday 2020

33% is a fairly significant pricecut. And Zen 3 and Ampere will be out for holiday 2020. Wether XSX or PS5 or new pc components will be widely available on the shelves everywhere for holiday 2020 is another story. Hence i included 2021to be fair
 
The number shows it BCPack + Zlib is better than Kraken for compression 4.8/2.4 =2 and 9/5.5 = 1.63 and 9/8 = 1.45. But at the end the PS5 SSD is faster...

A 2011 presentation of Iron Galaxy around tailoring a game around a SSD much slower than the one in PS5 and Xbox Series X but interesting

https://www.gdcvault.com/play/1014648/Delivering-Demand-Based-Worlds-with
How does PS5 SSD achieve 22GB/s speed? It needs 4:1 compression rate. Which algorithm will be used?
 
Anybody knows what are cache scrubbers for?. SSD data will go directly to gpu caches?. If so are they only useful for getting ssd data?.
 
So I posted this on ResetERA a few times, but not here. The whole idea is that, like Cerny described in a kind of round about way, is that Devs are going to choose a Mode, like GPU or CPU Mode. In the Former for example, the GPU has Power priority and the CPU goes below that maximum clock of 3.5 GHz. This ensures a more stable GPU clock. The other way around then in CPU Mode.
The extent to which the clocks Fall out for the different priorities remain to be clarified.
Variable frequency is a great approach. Some people think Sony just want to have higher spec on paper so they choose extremely
high clock which will be be varied in real games.

But the idea of variable frequency is to fully utilize the capability of the cooling system. Just think Xsx, if a cross-gen game only
uses 50% of CPU, why not give more power to GPU to raise to 12.6TF or even 13TF if the cooler can hold ?
 
How does PS5 SSD achieve 22GB/s speed? It needs 4:1 compression rate. Which algorithm will be used?
They use Kraken (better than Zlib which is also available). Usually the compression will be about 2:1 or so. But depending of the data compressed, it can go up to 4:1. It can uncompress that much (and reach 22GB/s) because of the dedicated hardware. Cerny said that 9 Zen 2 cores would have being needed to unpack the same data using that compression.

Interestingly those numbers were already in the SSD patent which was talking about a typical 10GB/s up to 20GB/s in some conditions.
 
How does PS5 SSD achieve 22GB/s speed? It needs 4:1 compression rate. Which algorithm will be used?
Only some types of data would. Others would be like 1.01:1. In practice it's like any compression where some files compress by a large factor, and others stay almost the same size. They did say it's 10% better than LZ though (the hardware compression in ps4 and xb1)

The interesting part is that the uncompressed data would be structured for immediate use by the GPU, so things like masks, maps, alphas, would compress like crazy, while most albedo would be a low ratio I suppose.

I don't fully understand the coherency module and cache scrubbers, I assume it's another RDNA2 thing...
 
33% is a fairly significant pricecut. And Zen 3 and Ampere will be out for holiday 2020. Wether XSX or PS5 or new pc components will be widely available on the shelves everywhere for holiday 2020 is another story. Hence i included 2021to be fair
Yes, but that drop took 3 months to happen. Zen 3 is by the looks of it going to be released relatively later than Zen 2 was released, so maybe end of Q3 starting sales immediately. If 3700X follows same pattern as 2700X, it would still be only 10% down from it's current price when the consoles hit the market.
 
It's in the settings from their open source graphics driver. Here are the relevant lines of code:

pSettings->nggVertsPerSubgroup = 254;
pSettings->nggPrimsPerSubgroup = 128;
Those values are set when the game is detected to be Woflenstein II or Doom, and only if the GPU is RDNA or later.
These are choices made by driver developers with regards to bug workarounds or performance tweaks for popular-enough games.

The point of Mesh shaders is to give game developers control over the input format and topology of their primitives. The driver changes appear to adjust what the internal compiler will do for chunking the standard stream of triangles, which it must be since the game developers wrote their shaders within the context of the standard pipeline.

It looks to me like there are at least two significant distinctions to be made, who is making these decisions, and where and how the shader is sourcing its inputs.

A possible reason why Vega/RDNA don't expose mesh shaders in D3D is because of these limits. D3D specifes a minimum of 256 verts/256 prims per meshlet so RDNA2 might have raised these limits.
The values in this case only apply to RDNA, which may be another bit of evidence that there have been significant elements of the original concept for primitive shaders that have been changed or dropped.
It's not clear what the motivations are for these titles, though it could be to resolve performance regressions, per-GPU issues, instability, or to allow API-illegal behavior going by the comments of nearby code entries.


Well we don't know for sure if Microsoft wants to entirely forgo the traditional geometry pipeline.
Microsoft's position is that it intends to reinvent the front end of the pipeline:
Mesh Shaders are not putting a band-aid onto a system that’s struggling to keep up. Instead, they are reinventing the pipeline.
https://devblogs.microsoft.com/dire...on-shaders-reinventing-the-geometry-pipeline/

Primitive shaders slot into a place within the existing pipeline. Changing that makes the link to AMD's original concept more tenuous.

For the hardware tessellator stage, the stated intent is to replace it.
The intent for the Amplification shader is to eventually replace hardware tessellators.
https://microsoft.github.io/DirectX-Specs/d3d/MeshShader.html




In theory, the XSX should be able to do 44% more ray intersections in parallel over PS5. PS5 will need to catch up using clockspeed, but it will be burdened by the lesser bandwidth.
I wouldn't necessary call that 15% difference. You're looking at TFs instead of intersections. Compute difference is 15-18%.
There's some theorizing that it's not strictly bandwidth, but also the memory transaction rate that the RT hardware can achieve. Since traversing the BVH means analyzing nodes and then following pointers to successive entries, there's more round-trip traffic rather than raw read bandwidth. Some of the research for ray tracing involves finding ways of gathering rays that hit the same nodes of the tree, which along with saving bandwidth saves on the number of round trips the hardware may need to make.
Depending on how well the BVH hardware caches data, or how often BVH data can be accessed again while still in a cache, there can be a benefit to higher clock speeds. Cerny did note that the primary downside is that memory is effectively further away in terms of latency, which means more divergent rays or less effective cache situations would make trips to memory or contention for memory more expensive for higher-clocked units.

PS5 has a chance to mitigate some of the BW disadvantages with their cache scrubbers.
This may be possible if they avoid significant writes to memory on cache flushes, or if they can avoid some flushes entirely. That may also change how much impact certain synchronization events have, where GPU cache flushes take time and the whole GPU needs to stall while they happen.
What's not clear yet is how much time this saves, or if it can avoid some of those events entirely.
An improvement there can help reduce the number of times where the CUs and fixed function units are idled, or how quickly they can ramp up afterwards due to caches being indiscriminately cleared.
Less clear is whether how much of this is unique to the PS5, or if there are different approaches that can get approximately the same result. Perhaps something like using fallback textures or last-frame data if the game is unsure of possibly state data can be close-enough versus a GPU that spends a lot of effort scrubbing caches.

You need bandwidth to access the structure, RT BVH structures are as large as 1.5 GB IIRC. So they aren't being held in cache.
The only thing remaining is to have hardware traverse the BVH structure for intersection.
If you look at the Quake 2 RTX benchmarks you'll see a benchmark that is specific to RT performance.
A BVH is an acceleration structure, so the hopeful idea is that much of it is not being accessed heavily, or that portions are accessed in a bursty manner. It's likely part of the reason why AMD and Nvidia use RT hardware that works with a stack-based traversal method, and it's an argument IMG is making for its ray-tracing coherence engine.

How does PS5 SSD achieve 22GB/s speed? It needs 4:1 compression rate. Which algorithm will be used?
The PS5 presentation mentioned zlib and Kracken compression. The 22 GB/s was given more as a peak value for some very lucky compression case, perhaps a lot of trivial or uniform values. That may point to a corner case in the existing decompression hardware that BCPack does not have. It's possible that if BCPack is more complex, they had other priorities to optimize for than a trivial decompression case that games wouldn't use. It may also point to a different choice in how the blocks are hooked up to the on-die fabric. Perhaps Sony opted to have a wider link or it has more links due to its internal arrangement, and that just happens to work out in some easier decode instances.

Anybody knows what are cache scrubbers for?. SSD data will go directly to gpu caches?. If so are they only useful for getting ssd data?.
There are coherence engines that seem to be monitoring when data loaded from the SSD is placed into memory, and they signal the scrubbers to clear any lines in the GPU caches that might have data in the address ranges that were overwritten.
How often this can happen or what overheads there might be wasn't discussed. It was mentioned only in the context of the storage subsystem, although I wonder if anything else like the CPU or the audio co-processor could do something similar. The CPU-GPU path usually has a different and more direct way to invalidate cache entries, so I'm not sure if that would benefit unless such events can be heavier in terms of synchronization.
 
Variable frequency is a great approach. Some people think Sony just want to have higher spec on paper so they choose extremely
high clock which will be be varied in real games.

But the idea of variable frequency is to fully utilize the capability of the cooling system. Just think Xsx, if a cross-gen game only
uses 50% of CPU, why not give more power to GPU to raise to 12.6TF or even 13TF if the cooler can hold ?
But you wont be able to run them all the time at max. I dunno, I would rather they hit 2150MHz and 3.5GHz locked, then variable. Few % up and down, when both will not work at max seems like meh solution...
 
Status
Not open for further replies.
Back
Top