Xbox Series X [XBSX] [Release November 10 2020]

Discussion in 'Console Technology' started by Megadrive1988, Dec 13, 2019.

  1. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,517
    Likes Received:
    3,260
    Location:
    Wrong thread
    This slide references a "multi core command processor". Can't find any mention of the command processor in RDNA 1 being multi core.

    In the past MS do seem to have have liked their custom command processors, perhaps this one is too...


    [​IMG]
     
    disco_ likes this.
  2. scently

    Veteran Regular

    Joined:
    Jun 12, 2008
    Messages:
    1,000
    Likes Received:
    213
    They customized the Command Processor on X1 and customized it further in the X1X so I would expect further customizations here too.
     
  3. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    102
    Likes Received:
    97
    The "Geometry Engine" appearing here is also interesting...

    [​IMG]
    This is some interesting insight into RDNA 2 too, 7-issue superscalar?
     
    #1203 Kugai Calo, Aug 18, 2020
    Last edited by a moderator: Aug 18, 2020
  4. disco_

    Regular Newcomer

    Joined:
    Jan 4, 2020
    Messages:
    267
    Likes Received:
    212
    Not really. It's part of rdna1 so it makes sense it's in rdna2 as well.
     
  5. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,046
    Likes Received:
    13,421
    Location:
    The North
    I think RDNA 1 does this as well, but perhaps not as broken down as this.
    RDNA issues 2 Vector ALU and 2 scalar as per what the white paper says. The 1 Vector Data and 2 Control may belong to the CU. So that's still 7.
     
    PSman1700 and function like this.
  6. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    341
    Likes Received:
    282
    This is a similar rate to GCN and RDNA. Basically it means the arbitrator selects up to N instructions from all non-blocked wavefronts, but only one instruction will ever be selected per wavefront. Otherwise, either the ISA needs to be compiler scheduled VLIW (definitely not the RDNA ISA) for co-issuing from the same wavefront, or the hardware itself needs to do scoreboarding which most GPU vendors deliberately avoid.

    So it is not superscalar by the books, and it is unlikely changing.
     
    PSman1700, function and iroboto like this.
  7. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,046
    Likes Received:
    13,421
    Location:
    The North
    A bit OT, but how is this normally defined? Doesn't superscalar just mean multiple instructions being executed in parallel on different subsystems?
     
    PSman1700 likes this.
  8. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    102
    Likes Received:
    97
    Yeah, so still new insight, not necessarily RDNA2-specific.

    Is scoreboarding less costly to implement than Tomasulo's? Although it seems unnecessary, like you said, GPUs don't need to deal with name&data dependence...
    Also it doesn't have to be VLIW to be software scheduled, just look at Nvidia's [post-] Volta ISAs.
     
  9. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    102
    Likes Received:
    97
    Traditionally I think it implies multiple instructions that belong to the same thread, therefore control and data hazards arise?
     
    Lalaland and function like this.
  10. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,046
    Likes Received:
    13,421
    Location:
    The North
    Yea, I think I understand where he's going with this.
    The dispatcher needs to look at all the commands in queue and figure out which ones to run in parallel together and which ones individually. So in effect there is some form of scoreboarding of instructions happening.
    I believe there is no real dispatcher for the CU or a single instruction queue for it to look at wrt RDNA 1, the instructions (vector, memory and scalar) are split up into separate memory pools on the CU iirc..
     
  11. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    102
    Likes Received:
    97
    In another example, a n-way SMT core launches <= n instructions per cycle, but it's not called superscalar.
    Yes there will be instruction dispatcher in the CU, you can't split the instructions stream into multiple ones of different instruction class and put them under different address range, it just doesn't make sense, programs don't work like that. On the other hand there's VLIW approach where you may put instructions of different classes into their corresponding 'slot' within an instruction 'packet'.
     
    Lalaland, PSman1700, function and 2 others like this.
  12. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    341
    Likes Received:
    282
    It is not “superscalar by the books” because the motivation of superscalar is exploiting ILP in one instruction stream, through tracking data dependencies to discover opportunities for co-issuing. GCN and RDNA do no such thing — they are instead multiplexing up to 20 independent instruction streams into a multi-issue pipeline, while enforcing a 1 inst/clock issue limit on each stream. So from the perspective of each inst. stream, instructions are strictly sequentially issued one by one with no superscalar capability.
     
    Silenti, Lalaland, PSman1700 and 3 others like this.
  13. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    341
    Likes Received:
    282
    My stand corrected a bit on RDNA scheduling — according to the ISA documentation, RDNA actually does track data dependencies in hardware, and does not require manually inserted wait state even though now it exposes the multi-cycle execution pipeline (unlike GCN).

    So while I can’t rule out that it can do superscalar issue “by the books”, my two cents is that it probably isn’t, since it can already issue from a huge pool of wavefronts even with each of them contributing just 1 per clock.
     
    #1213 pTmdfx, Aug 18, 2020
    Last edited: Aug 18, 2020
  14. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,046
    Likes Received:
    13,421
    Location:
    The North
    conference live blog
    https://www.anandtech.com/show/1599...ft-xbox-series-x-system-architecture-600pm-pt

    09:39PM EDT - Q: With 20 channels GDDR6, is that really cheaper than 2 stacks HBM? A: We're not religious about which DRAM tech to use. We needed the GPU to have a ton of bandwidth. Lots of channels allows for low latency requests to be serviced. HBM did have an MLC model thought about, but people voted with their feet and JEDEC decided not to go with it.

    09:38PM EDT - Q: Says Zen 2 is server class, but you use L3 mobile class? A: Yeah our caches are different, but I won't say any more, that's more AMD.

    09:37PM EDT - Q: TSMC 7nm enhanced, is it N7P, N7+, or something else? A: It's not base 7nm, it's progressed over time. Lots of work between AMD and TSMC to hit our targets and what we needed


    some items we were discussing:
    • 09:22PM EDT - supports up to 2x2
    • 09:22PM EDT - VRS
    • 09:21PM EDT - CUs have 25% better perf/clock compared to last gen (my edit: I assume last gen is with respect to RDNA 1 -- so this is not the advertised 50% better perf/watt as AMD specifies)
     
    #1214 iroboto, Aug 18, 2020
    Last edited: Aug 18, 2020
  15. anexanhume

    Veteran Regular

    Joined:
    Dec 5, 2011
    Messages:
    2,045
    Likes Received:
    1,454
    I didn’t say you said it was hardware :)

    Software is inherently custom unless it’s OSS.

    Once AMD confirmed RDNA 2 was full Tier 2 VRS, I went and re-read the MS VRS patent and came away with the above interpretation you quoted.
     
  16. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    102
    Likes Received:
    97
    On that you can play with shader disassembly here, just select Radeon GPU Analyzer as compiler.
     
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,046
    Likes Received:
    13,421
    Location:
    The North
    But hardware is involved. So perhaps I'm not understanding how their patent is only software based
    • Tiny area cost for 10-30% performance gain
    [​IMG]
     
    Silent_Buddha, BRiT and PSman1700 like this.
  18. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    3,484
    Likes Received:
    1,391
    Well that confirms their custom vrs is hardware.
     
  19. anexanhume

    Veteran Regular

    Joined:
    Dec 5, 2011
    Messages:
    2,045
    Likes Received:
    1,454
    Of course hardware is involved. RDNA 2 is VRS tier 2 compatible.

    The patent in question consistently refers to the methods described as “application-directed”, as in, the operations performed by the GPU are not completely self-determined in hardware.
     
    #1219 anexanhume, Aug 18, 2020
    Last edited: Aug 18, 2020
  20. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,659
    Likes Received:
    1,470

    Much of this appears to be marketing TBH. It's great stuff to dig deeper and legitimate technical info of course, but they're not going to tell us any potential hidden drawbacks of the hardware here.

    I dont think there will be many native 8k games and it was definitely about marketing to put that on the Scarlett chip etc. 4k is this generation (if that).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...