Next Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    This doesn't make any sense at all. Is this a reputable benchmark?
     
  2. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    768
    Likes Received:
    532
    Can't tell what clocks the cards are actually running at during the tests, my guess is throttling.
     
    disco_ and BRiT like this.
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    Yah, most likely hitting power or thermal limit and then clocks drop. I'd want to see this with a graph of the clock speed.
     
    BRiT likes this.
  4. Proelite

    Veteran Regular Subscriber

    Joined:
    Jul 3, 2006
    Messages:
    1,458
    Likes Received:
    817
    Location:
    Redmond
    My guess is bw bottleneck. Clock doesn't scale as well as number of CUs given the same BW.
     
    PSman1700 likes this.
  5. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,082
    Likes Received:
    6,418
    Likewise there isn't any indication that it can't be disabled or that it is even enabled.

    If you think that Developers who have the PS5 devkit are "talking out of their ass" telling Dictator that this is the guidance that Sony are telling them, then sure.

    Regards,
    SB
     
    blakjedi, PSman1700 and VitaminB6 like this.
  6. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    136
    Likes Received:
    122
    Mesh shaders are more than just having control over the input format or the topology. In fact having control over the input format isn't all that useful since it's very hard to beat the hardware's input assembler. As far as applications of having a programmer defined primitive topology goes it might be useful for CAD applications where models represented by quads are triangulated by the mesh shaders but there are still limitations with this since the output primitive topology still has to be defined either in lines or triangles for the hardware to generate the edge equations for the rasterizer.

    It's probably because of hardware bugs rather than anything architecturally specific ...

    Amplification shaders or 'task' shaders as some IHV would have it can never truly replace the traditional geometry pipeline or the tessellation stages because of a major failure case of streamout/transform feedbacks. Geometry shaders or the tessellation stage can potentially output an 'unknown' number of primitives and this is highly problematic with transform feedbacks since the hardware would have no way of corresponding the primitive output order with the primitive input order. Heck, primitive restart can cause the same ordering issues as well so there's no chance of amplification/mesh shaders being able to emulate something as simple as a vertex shader.

    Primitive shaders with hardware support for global ordered append is currently the viable path that offers mesh shaders to be compatible with the traditional geometry pipeline.
     
    anexanhume likes this.
  7. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    I was just pondering this in the other thread. So power profile choices for the Devs it is then. Makes sense. I don't, however, believe it's strictly a choice around CPU/GPU power balance since SmartShift in and of itself can't provide enough power to compensate for a super heavy GPU load. GPU can easily pull 4x more than the CPU. We'll know a lot more once the power profiles are public.
     
    PSman1700 likes this.
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    I agree that it's only one part of what they can do. That doesn't conflict with the observation that it is something that primitive shaders do not match. I'm stating my interpretation that it seems that what AMD uses as primitive shaders appear to be distinct in many ways from what mesh shaders are defined as. The vendors took a general premise of using more compute-like shading and decided to optimize or change different elements of the setup process, and they are compatible or incompatible with different things as a result.
    From the differences in input, usage, and compatibility, I get the impression that they aren't the same thing.

    I think this goes to whether you want to take the stated reasons for mesh shaders at their word, since it's how the index processing portion of input assembly is being replaced.
    I agree that replacing the hardware has proven difficult, since even improving it was unsuccessful with primitive shaders in Vega. However, that portion of the pipeline is where the serial bottleneck and flexibility concerns are located that the mesh shader proposal cites as its reason for being.
    If primitive shaders do not share that reason for existing, that's another difference.

    There are a lot of individual reasons for the profile changes, some based on games that do not use API commands properly, or architectural quirks that make certain features a performance negative.
    They are too low-level and product specific in many case to be appropriate as API-level considerations, which is what mesh shaders are.

    Nvidia and Microsoft would agree with this, and their response was to dispense with those stages. Nvidia still gives tessellation an ongoing role for being more efficient for some amplification patterns, while Microsoft in the passage I quoted has stated what they plan for the tessellator.
    Ordering behavior is an element many proponents of a completely programmable paradigm have neglected, but it still leaves a distinct difference where primitive shaders can have stronger ordering because they feed from that serialized source, and mesh shaders do not.

    Perhaps that will be put forward in primitive shaders 2.0 and mesh shaders 1.x, as I haven't seen the various parties involved committing to this publicly.
     
  9. Lurkmass

    Newcomer

    Joined:
    Mar 3, 2020
    Messages:
    136
    Likes Received:
    122
    Primitive shaders pretty much do the same thing as mesh shaders would which is to chunk geometry into smaller pieces based off of the input vertex/primitive data and AMD HW since GCN was always flexible with the meshes input data since they do programmable vertex pulling so they don't have a "hardware input assembler" to speak of. It's that GCN didn't have the flexibility in how it could group these chunks of mesh data and that is what primitive shaders shaders mostly aim to do.

    There's really nothing mythical about mesh shaders on AMD hardware. GCN was always capable of programmer defined input mesh data and they just recently added the ability for their hardware to create meshlets.

    The only truly major difference between mesh shaders as defined in APIs compared to AMD's hardware implementation is that task shaders don't exist on their hardware.
     
    Silenti and Proelite like this.
  10. bgroovy

    Regular Newcomer

    Joined:
    Oct 15, 2014
    Messages:
    799
    Likes Received:
    626
    Vast majority of the time = almost always.

    The official spec sheet says 8 cores/16 threads and 3.5Ghz. So yes, it's enabled.

    I haven't seen him claim to have devs sources on this at all. It's purely his speculation as far as I can tell.
     
  11. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    no he said it was from sources working on ps5.
     
    PSman1700 likes this.
  12. bgroovy

    Regular Newcomer

    Joined:
    Oct 15, 2014
    Messages:
    799
    Likes Received:
    626
    When? Where? Last I saw it was just his hunch.
     
  13. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    114
    Likes Received:
    138
    Every single time I've seen him mention it, he's stated that he knows people working with the actual hardware that are telling him this directly.
     
    PSman1700 and Proelite like this.
  14. bgroovy

    Regular Newcomer

    Joined:
    Oct 15, 2014
    Messages:
    799
    Likes Received:
    626
    Then it should be easy for you to link this.
     
  15. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
  16. Remij

    Newcomer

    Joined:
    May 3, 2008
    Messages:
    114
    Likes Received:
    138
    Edited to remove images as Scott above posted the links :happy2:
     
    PSman1700 likes this.
  17. bgroovy

    Regular Newcomer

    Joined:
    Oct 15, 2014
    Messages:
    799
    Likes Received:
    626
    Oh, OK. Dictator didn't say it, Dark1x did. That must be one of the threads I'm not following as well. That said, I wouldn't take their word as sacrosanct. DF has been wrong in the past and do not always accurately interpret the information shared with them.

    And I will reiterate this behavior they are describing is the opposite of what Cerny talked about in the presentation: that certain GPU loads cause the GPU to throttle, and certain CPU loads will cause the CPU to throttle, not that CPU loads will throttle the GPU, or back the other way. Of course, even if that's true it's not clear how much that matters if we assume the GPU load that causes the CPU the throttle means you're GPU limited anyway, and vice-versa.
     
  18. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    981
    Likes Received:
    1,108
    Thanks for clarification. Maybe it was me who started all the noise, initially assuming it's a trade between CPU and GPU (oops).
     
  19. RobertR1

    RobertR1 Pro
    Legend

    Joined:
    Nov 2, 2005
    Messages:
    5,740
    Likes Received:
    937
    That's exactly it. You have a set power budget. What will likely happen is that there will hw profiles where say the CPU is hardset to 40w while the GPU has a budget of 200w (random numbers). This will be way more consistent and manageable for a developer to work in than a "use whatever you need when you need it" approach. We're not talking one guy making a game. This is a large dev team where you got a lot of people working on it and there needs to be consistency is the targeted performance envelope.

    Generally speaking, AVX instructions running in place (in cache) draw a lot of power and produce a lot of heat. This is generally the torture test of all torture tests for a CPU. AVX is starting to get picked up in games also. Off the top of my head from the games I have, BFV, Assassins Creed, Shadow of the Tomb Raider and a few others are known to be "CPU heavy" on the PC due to their use of AVX. I'm sure that number will continue to grow.

    Another thing to keep in mind is the RAM. The faster and more tightly tuned your RAM is, the more power the CPU will draw because the RAM is able to keep feeding the CPU at a faster rate. You can test this in synthetics on a PC. If you run jedec spec vs xmp vs manually tuned ram and then run a synthetic like IntelBurnTest or a rendering applicaton like X265 and see the scores (gflops for IBT, fps for x265) they will clearly demonstrate the impact of RAM performance on CPU performance. You'll also see a notable difference between in the heat and power draw by the CPU purely due to the RAM performance.

    So while the cpu speeds alone are a valid comparison and easy to understand for comparative purposes, the impact of RAM performance between the two consoles when paired with the CPU and the type of instructions being used will determine the overall performance envelope.

    Maybe I'll do some x265 runs with screenshots to show this.
     
    Silent_Buddha, London-boy and BRiT like this.
  20. Dictator

    Newcomer

    Joined:
    Feb 11, 2011
    Messages:
    245
    Likes Received:
    926
    Dev sources about the priority Mode Thing. I think it makes sense to be sceptical of my typing that, because it sounds weird to me that variable clocks need to be mentioned really if they are indeed actually fixed all the time in practice at their max for both parts.
     
    blakjedi, xpea, Silent_Buddha and 5 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...