Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
So I just thought I should look into how well does RDNA scale with clocks and here is actually interesting benchmark on YT with 5700 (36CU) and 5700 (XT).

In this case, 5700 is clocked at 2150MHz (9.9TF) and 5700XT is clocked at 1750MHz (8.9TF).


Results are surprising because it seems XT pulls ahead most of the time, and sometimes not even by a little, even though it has 400MHz and 1TF deficit. Not sure what to make of this really...

This doesn't make any sense at all. Is this a reputable benchmark?
 
So I just thought I should look into how well does RDNA scale with clocks and here is actually interesting benchmark on YT with 5700 (36CU) and 5700 (XT).

In this case, 5700 is clocked at 2150MHz (9.9TF) and 5700XT is clocked at 1750MHz (8.9TF).
Results are surprising because it seems XT pulls ahead most of the time, and sometimes not even by a little, even though it has 400MHz and 1TF deficit. Not sure what to make of this really...
Can't tell what clocks the cards are actually running at during the tests, my guess is throttling.
 
There's no indication at this point the PS5 can run with SMT disabled.

Likewise there isn't any indication that it can't be disabled or that it is even enabled.

Dictator is talking out his ass. This is explicitly the opposite of what Cerny says happens. DF are not an official source when they are going on about their pet theories.

If you think that Developers who have the PS5 devkit are "talking out of their ass" telling Dictator that this is the guidance that Sony are telling them, then sure.

Regards,
SB
 
Those values are set when the game is detected to be Woflenstein II or Doom, and only if the GPU is RDNA or later.
These are choices made by driver developers with regards to bug workarounds or performance tweaks for popular-enough games.

The point of Mesh shaders is to give game developers control over the input format and topology of their primitives. The driver changes appear to adjust what the internal compiler will do for chunking the standard stream of triangles, which it must be since the game developers wrote their shaders within the context of the standard pipeline.

Mesh shaders are more than just having control over the input format or the topology. In fact having control over the input format isn't all that useful since it's very hard to beat the hardware's input assembler. As far as applications of having a programmer defined primitive topology goes it might be useful for CAD applications where models represented by quads are triangulated by the mesh shaders but there are still limitations with this since the output primitive topology still has to be defined either in lines or triangles for the hardware to generate the edge equations for the rasterizer.

It looks to me like there are at least two significant distinctions to be made, who is making these decisions, and where and how the shader is sourcing its inputs.


The values in this case only apply to RDNA, which may be another bit of evidence that there have been significant elements of the original concept for primitive shaders that have been changed or dropped.
It's not clear what the motivations are for these titles, though it could be to resolve performance regressions, per-GPU issues, instability, or to allow API-illegal behavior going by the comments of nearby code entries.

It's probably because of hardware bugs rather than anything architecturally specific ...

Microsoft's position is that it intends to reinvent the front end of the pipeline:

https://devblogs.microsoft.com/dire...on-shaders-reinventing-the-geometry-pipeline/

Primitive shaders slot into a place within the existing pipeline. Changing that makes the link to AMD's original concept more tenuous.

For the hardware tessellator stage, the stated intent is to replace it.

https://microsoft.github.io/DirectX-Specs/d3d/MeshShader.html

Amplification shaders or 'task' shaders as some IHV would have it can never truly replace the traditional geometry pipeline or the tessellation stages because of a major failure case of streamout/transform feedbacks. Geometry shaders or the tessellation stage can potentially output an 'unknown' number of primitives and this is highly problematic with transform feedbacks since the hardware would have no way of corresponding the primitive output order with the primitive input order. Heck, primitive restart can cause the same ordering issues as well so there's no chance of amplification/mesh shaders being able to emulate something as simple as a vertex shader.

Primitive shaders with hardware support for global ordered append is currently the viable path that offers mesh shaders to be compatible with the traditional geometry pipeline.
 
https://www.resetera.com/threads/nx...eneration-is-born.176121/page-4#post-30070900

Devs will choose whether they want full Power to gpu or full Power to CPU where one or the other underclocks below the listed spec. So a game to game Basis. I imagine most cross gen games will choose to prefer higher clocked gpu Mode as they will be gpu bound even if the Zen cores are underclocked. Zen just runs around the Jag that most cross gen games are not going to worry about CPU time, especially 30 fps games.​

So unless the devs push both sides to the max at the same time it should be fine.

I was just pondering this in the other thread. So power profile choices for the Devs it is then. Makes sense. I don't, however, believe it's strictly a choice around CPU/GPU power balance since SmartShift in and of itself can't provide enough power to compensate for a super heavy GPU load. GPU can easily pull 4x more than the CPU. We'll know a lot more once the power profiles are public.
 
Mesh shaders are more than just having control over the input format or the topology.
I agree that it's only one part of what they can do. That doesn't conflict with the observation that it is something that primitive shaders do not match. I'm stating my interpretation that it seems that what AMD uses as primitive shaders appear to be distinct in many ways from what mesh shaders are defined as. The vendors took a general premise of using more compute-like shading and decided to optimize or change different elements of the setup process, and they are compatible or incompatible with different things as a result.
From the differences in input, usage, and compatibility, I get the impression that they aren't the same thing.

In fact having control over the input format isn't all that useful since it's very hard to beat the hardware's input assembler.
I think this goes to whether you want to take the stated reasons for mesh shaders at their word, since it's how the index processing portion of input assembly is being replaced.
I agree that replacing the hardware has proven difficult, since even improving it was unsuccessful with primitive shaders in Vega. However, that portion of the pipeline is where the serial bottleneck and flexibility concerns are located that the mesh shader proposal cites as its reason for being.
If primitive shaders do not share that reason for existing, that's another difference.

It's probably because of hardware bugs rather than anything architecturally specific ...
There are a lot of individual reasons for the profile changes, some based on games that do not use API commands properly, or architectural quirks that make certain features a performance negative.
They are too low-level and product specific in many case to be appropriate as API-level considerations, which is what mesh shaders are.

Geometry shaders or the tessellation stage can potentially output an 'unknown' number of primitives and this is highly problematic with transform feedbacks since the hardware would have no way of corresponding the primitive output order with the primitive input order.
Nvidia and Microsoft would agree with this, and their response was to dispense with those stages. Nvidia still gives tessellation an ongoing role for being more efficient for some amplification patterns, while Microsoft in the passage I quoted has stated what they plan for the tessellator.
Ordering behavior is an element many proponents of a completely programmable paradigm have neglected, but it still leaves a distinct difference where primitive shaders can have stronger ordering because they feed from that serialized source, and mesh shaders do not.

Primitive shaders with hardware support for global ordered append is currently the viable path that offers mesh shaders to be compatible with the traditional geometry pipeline.
Perhaps that will be put forward in primitive shaders 2.0 and mesh shaders 1.x, as I haven't seen the various parties involved committing to this publicly.
 
I agree that it's only one part of what they can do. That doesn't conflict with the observation that it is something that primitive shaders do not match. I'm stating my interpretation that it seems that what AMD uses as primitive shaders appear to be distinct in many ways from what mesh shaders are defined as. The vendors took a general premise of using more compute-like shading and decided to optimize or change different elements of the setup process, and they are compatible or incompatible with different things as a result.
From the differences in input, usage, and compatibility, I get the impression that they aren't the same thing.

Primitive shaders pretty much do the same thing as mesh shaders would which is to chunk geometry into smaller pieces based off of the input vertex/primitive data and AMD HW since GCN was always flexible with the meshes input data since they do programmable vertex pulling so they don't have a "hardware input assembler" to speak of. It's that GCN didn't have the flexibility in how it could group these chunks of mesh data and that is what primitive shaders shaders mostly aim to do.

There's really nothing mythical about mesh shaders on AMD hardware. GCN was always capable of programmer defined input mesh data and they just recently added the ability for their hardware to create meshlets.

The only truly major difference between mesh shaders as defined in APIs compared to AMD's hardware implementation is that task shaders don't exist on their hardware.
 
"majority of the time" <> 'almost always'

Vast majority of the time = almost always.

Likewise there isn't any indication that it can't be disabled or that it is even enabled.

The official spec sheet says 8 cores/16 threads and 3.5Ghz. So yes, it's enabled.

If you think that Developers who have the PS5 devkit are "talking out of their ass" telling Dictator that this is the guidance that Sony are telling them, then sure.

I haven't seen him claim to have devs sources on this at all. It's purely his speculation as far as I can tell.
 
Vast majority of the time = almost always.



The official spec sheet says 8 cores/16 threads and 3.5Ghz. So yes, it's enabled.



I haven't seen him claim to have devs sources on this at all. It's purely his speculation as far as I can tell.

no he said it was from sources working on ps5.
 
Oh, OK. Dictator didn't say it, Dark1x did. That must be one of the threads I'm not following as well. That said, I wouldn't take their word as sacrosanct. DF has been wrong in the past and do not always accurately interpret the information shared with them.

And I will reiterate this behavior they are describing is the opposite of what Cerny talked about in the presentation: that certain GPU loads cause the GPU to throttle, and certain CPU loads will cause the CPU to throttle, not that CPU loads will throttle the GPU, or back the other way. Of course, even if that's true it's not clear how much that matters if we assume the GPU load that causes the CPU the throttle means you're GPU limited anyway, and vice-versa.
 
There seems to be a general confusion on here about not being able to distinguish between clock speed and current. Not saying it’s you, just speaking in general.

The load you put in at a given frequency will determine the voltage needed and thus the power draw.

Let’s say a cpu doing 3ghz pulling 20amps at 1v is generating 20w of power draw. 20w is it’s max budget. We won’t go into vrm complications on here.

Lets drop that down to 10amps. There no reason for the frequency to drop as it’s below the max power budget. So at 10 amp load the cpu is happily signing along at 3ghz (max set) drawing 10w.

Now a 25amp load comes in. Because you’re power throttling, your cpu frequency will drop as needed until ”voltage x amps = max power draw.” That’s the “boost” part.

In this situation, it’s actually reverse boost meaning its controlling frequency drop from best case scenario based on load coming in and the power budget allocated.

Thanks for clarification. Maybe it was me who started all the noise, initially assuming it's a trade between CPU and GPU (oops).
 
So is it correct to say the following?

In this situation the CPU and GPU are part of 1 single unit that has an upper power draw limit. So if the GPU portions are mostly maxed out drawing the majority of the power, you have less power to allocate to the CPU.

That's exactly it. You have a set power budget. What will likely happen is that there will hw profiles where say the CPU is hardset to 40w while the GPU has a budget of 200w (random numbers). This will be way more consistent and manageable for a developer to work in than a "use whatever you need when you need it" approach. We're not talking one guy making a game. This is a large dev team where you got a lot of people working on it and there needs to be consistency is the targeted performance envelope.

It would be useful if there was a chart or appendix of empirical measurements of CPU and GPU instruction mixes and what impact it has on the timings. Since the frequencies will vary, the latencies of the instructions would vary too.

Maybe even a listing of instructions or instruction mixes and their power draw impacts.

Generally speaking, AVX instructions running in place (in cache) draw a lot of power and produce a lot of heat. This is generally the torture test of all torture tests for a CPU. AVX is starting to get picked up in games also. Off the top of my head from the games I have, BFV, Assassins Creed, Shadow of the Tomb Raider and a few others are known to be "CPU heavy" on the PC due to their use of AVX. I'm sure that number will continue to grow.

Another thing to keep in mind is the RAM. The faster and more tightly tuned your RAM is, the more power the CPU will draw because the RAM is able to keep feeding the CPU at a faster rate. You can test this in synthetics on a PC. If you run jedec spec vs xmp vs manually tuned ram and then run a synthetic like IntelBurnTest or a rendering applicaton like X265 and see the scores (gflops for IBT, fps for x265) they will clearly demonstrate the impact of RAM performance on CPU performance. You'll also see a notable difference between in the heat and power draw by the CPU purely due to the RAM performance.

So while the cpu speeds alone are a valid comparison and easy to understand for comparative purposes, the impact of RAM performance between the two consoles when paired with the CPU and the type of instructions being used will determine the overall performance envelope.

Maybe I'll do some x265 runs with screenshots to show this.
 
Vast majority of the time = almost always.



The official spec sheet says 8 cores/16 threads and 3.5Ghz. So yes, it's enabled.



I haven't seen him claim to have devs sources on this at all. It's purely his speculation as far as I can tell.
Dev sources about the priority Mode Thing. I think it makes sense to be sceptical of my typing that, because it sounds weird to me that variable clocks need to be mentioned really if they are indeed actually fixed all the time in practice at their max for both parts.
 
Status
Not open for further replies.
Back
Top