Xbox Series X [XBSX] [Release November 10 2020]

Discussion in 'Console Industry' started by Megadrive1988, Dec 13, 2019.

  1. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    If they are using the GPU for ML for auto-HDR there should be an impact.
    The simplest explanation is that there is significant margin to running these BC titles that the impact from auto HDR is a non-factor.
    Another explanation is that, they were particular in mentioning that running BC titles meant using the GCN instruction set, which completes a wavefront every 4 cycles, and therefore launches a new one then; compared to RDNA which can launch a new wavefront every cycle. I do wonder if it's possible for MS to launch auto-hdr into those free cycles.
     
  2. mawver

    Newcomer

    Joined:
    Aug 8, 2020
    Messages:
    24
    Likes Received:
    24
    I understand the arguments, but autoHDR is an example, my original point is what impact the dual-pipe would have on the XS hw since navi21 apparently doesn't use it
     
  3. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,516
    Likes Received:
    24,424
    Wasn't there some portion that was done as part of the extended instruction slots -- where it's a use it or not benefit from it kind of thing? I need to look at the HotChips and AMD RDNA documentation again to see how the TOPs fit in with FLOPS.
     
  4. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,033
    Likes Received:
    3,428
    Get better utilization if you only need lower precision.

    But still using the same INT pipeline. Otherwise I'd see it as lot more than just adding lower precision and something more akin to tensor cores, or parallel lower precision pipeline, which would be lot more work than sounded like they did.
    That was my impression though, so be interesting to get further input.
     
    thicc_gaf and BRiT like this.
  5. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,033
    Likes Received:
    3,428
    Playable PUBG on console
     
    Silent_Buddha and Kugai Calo like this.
  6. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,698
    Likes Received:
    428
    Location:
    Somewhere out there
    Really shitty analysis UI. You don't move the scales around all the time.
     
  7. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,033
    Likes Received:
    3,428
    My takeaway was solid 60fps with no frame tearing..
    Looked like it was bouncing around 61fps though which may not be good.
    Have to admit I never listened much to what was said, I'll wait for DF for more nuanced breakdown.
     
    Kugai Calo likes this.
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    The problem with that is the fact that it's the same CU's no matter if you're doing INT8/INT4 or FP32, so it could affect performance too. It's of course possible that it's just so fast to do it won't affect performance in any noticeable way.
    edit: or that backwards compatible titles will just always have free CUs.
     
    thicc_gaf, iroboto and BRiT like this.
  9. Unknown Soldier

    Veteran

    Joined:
    Jul 28, 2002
    Messages:
    4,047
    Likes Received:
    1,670
  10. mawver

    Newcomer

    Joined:
    Aug 8, 2020
    Messages:
    24
    Likes Received:
    24
    I think..

    Calculating INT4 or FP32 costs the same, one cycle

    That would be against what they said "absolutely no performance cost to the CPU, GPU or memory and there is no additional latency"
     
  11. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    you can calculate 8x INT4 and 4XINT8 in a single cycle vs 1xFP32.
    That's just rapid packed math.
    With the inclusion of the ML features into the CUs, they can also perform mixed dot-products if required in a single cycle between Int4 and Int8.

    Not necessarily. BC titles can only submit work once every 4 cycles, there's is idle time that can still be taken advantage of.
     
    thicc_gaf and BRiT like this.
  12. mawver

    Newcomer

    Joined:
    Aug 8, 2020
    Messages:
    24
    Likes Received:
    24
    I know...

    It may be my English, but I believe that the friend had suggested that because it is a simpler calculation (INT) it would be done "faster" so I pointed out that it doesn't matter if it is INT or FP the cost is the same.


    Ok would it make sense in BC simulating GCN, but doesn't it explain the story BVH offline

    Says Andrew Goossen. "For the Series X, this work is offloaded onto dedicated hardware and the shader can continue to run in parallel with full performance. In other words, Series X can effectively tap the equivalent of well over 25 TFLOPs of performance while ray tracing."
     
    thicc_gaf likes this.
  13. AzBat

    AzBat Agent of the Bat
    Legend

    Joined:
    Apr 1, 2002
    Messages:
    7,749
    Likes Received:
    4,847
    Location:
    Alma, AR
  14. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    I guess, doing 8x Int4 or 4x Int8 would be much faster if those are the precisions being used over FP32. Machine learning tends to be highly parallel in terms of just doing the same repeated calculation over and over again. Though some algorithms are serial, but I'm not sure what MS is doing here.

    What is the BVH offline story, this is the first time heard of it. I think our original understanding of MS statements here is that the Ray Tracing hardware can perform all the intersection tests while the shader is running in full performance. But shaders are still required to traverse the BVH tree IIRC.
     
    thicc_gaf likes this.
  15. thicc_gaf

    Regular

    Joined:
    Oct 9, 2020
    Messages:
    335
    Likes Received:
    259
    ExecuteIndirect deals with drawcall stalls IIRC; it's supposed to help with reducing stalls on drawcalls for GPU instructions from what (limited) bits I've read on it.
     
  16. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,966
    Location:
    Barcelona Spain




    Very strange if true maybe a later update
     
    PSman1700 likes this.
  17. iceberg187

    Regular

    Joined:
    Jul 31, 2006
    Messages:
    720
    Likes Received:
    235
    Location:
    Hobart, Indiana
    Two more weeks, excluding today. For the longest time, these didn't feel totally real. That new hardware feeling is starting to settle in.
     
  18. mawver

    Newcomer

    Joined:
    Aug 8, 2020
    Messages:
    24
    Likes Received:
    24
    It was also talked about in hot chips ...
    "shade can run in parallel for BVH traversal, material shading, etc"

    Based on Andrew Goossen's speeches I think it's reasonable to assume that the xbox sends a set of instructions to wgp0 and wgp1 when a job goes on hold (I don't know, maybe requesting data in the memory) the other executes improving occupancy

    Riiiiiight?
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Auto HDR is added for backwards compatibility mode. Typically, such modes are relegated to a subset of the CU and memory resources available to the full console. If everything is relative to the performance of the original title, there is no loss since the full console may not be used for the game.

    I'm not sure there's a gap in submitting work like that. The only 4 cycles I'm aware of is the 4 cycle issue cadence for wavefronts in GCN, but they perform work and write back results every cycle since each instruction is applied to 4 cycles.

    If you are referencing an article from Digitalfoundry: https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
    I note that the first paragraph is not from Goossen, but is a statement by the author of the article. I think that paragraph has a good chance of being mistaken.
    Goossen's statement could readily map to the fixed-function intersection and node evaluation hardware in the RDNA2 RT block. It might make more sense being interpreted this way, since his scenario has a shader is calling on the RT functionality and can run in parallel. BVH construction would precede any shader that might depend on it, a shader wouldn't be trying to build a BVH and trying to do something else in the meantime.
     
    iroboto and BRiT like this.
  20. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    no, not necessarily. ExecuteIndirect allows for the GPU to call it's own kernels without the assistance of the CPU.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...