AMD: Navi Speculation, Rumours and Discussion [2019]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

  1. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    What.
    Nothing called "Arcturus" ever appeared on any AMD roadmap, ever.
     
  2. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
    Should we still wait ? :rolleyes:
     
  3. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Yes!
    Now a ~year more.
     
  4. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
  5. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    225
    Likes Received:
    97
    Does anybody know the Hit latency of L1 Cache? Volta had 28 cycles and it was 400% faster then pascal.
     
  6. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    You'll see when the fat one drops.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    [Looking at the slides from Computerbase.de]

    It seems the main benefit AMD is talking about is that heavy VGPR allocation threads will run with much less idle time, so shaders that allocate 64 VGPRs or more will no longer kill throughput if they are memory-intensive too. (GCN with only 2 hardware threads due to high VGPR or LDS allocation is quite happy as long as ALU:MEM ratio is fairly high, e.g. 20:1).

    The pain point with GCN when it was introduced was that it was more sensitive to ALU:MEM ratio than the old VLIW machines. So this is a big deal: AMD will spend less time advising developers to watch out for VGPR allocations. NVidia solved this, eventually, by giving developers more VGPRs.

    I'm not a fan of 32-wide hardware threads, even if there is a nice mapping to the bits of a DWORD, because a "square" (8x8) is often really good for breaking down work, and there's no square with a 32-wide thread. But 64 is still an option and a slide implies that in Workgroup Processor mode there are twice as many VGPRs available to each work group. So that's pretty spiffy!

    This sounds like a fun machine to program: translation, they've given us a new exciting coarse-grained switch to play with, making the combinatorial space for algorithm optimisation twice as big...

    Also AMD talks vaguely about spin-up and spin-down timings being better with this new design: the wider SIMDs help with that and the significantly denser scheduling hardware and caches, seemingly with no scheduler sharing even within the CU, let alone across CUs, should make time lost to short-lived and/or high-VGPR allocation threads much better.

    So yeah, this really is a new hardware architecture for the CUs, not far off the change from VLIW to scalar when GCN launched.
     
    mahtel, Lightman, Newguy and 6 others like this.
  8. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,756
    That crystal ball needs some polish I guess...
     
  9. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,028
    Likes Received:
    3,101
    Location:
    Pennsylvania
    According to Anandtech's article, it does not have HDMI 2.1. It's DP 1.4 and HDMI 2.0b.
     
    Adonisds likes this.
  10. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Also whatever they did to FF is fun.
    Very polished one.
     
  11. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,261
    Likes Received:
    1,778
    Location:
    Winfield, IN USA
    Is it me or does Chris Hook just look a whole lot different lately? :|
     
  12. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    331
    Likes Received:
    85
    Assuming the two SEs are true here. There was an update to Apple Macbook pro Vegas last year, they had 20 CUs and only one shader engine. Assumedly this was some prototype for RDNA/5700, which now seem to have 20 CUs per shader engine

    Yeah my bad that was just rumors throwing that name around.
     
  13. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Look at the block diagram.
    These are not SEs you're thinking of.
     
  14. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    What was the curious data for Vega64? It performs pretty close to 4 prims/clk.
     
  15. snarfbot

    Regular Newcomer

    Joined:
    Apr 23, 2007
    Messages:
    574
    Likes Received:
    188
    seems pretty decent to me, a bit less money than the 2070, a bit more performance on average, cant really complain about that.

    also remember navi was designed with scalability in mind so a bigger faster chip is going to be out sooner rather than later.

    at any rate its finally coming out so we can all move on to speculating about the next big thing, and that's the real fun isnt it?
     
  16. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    331
    Likes Received:
    85
    Big edit - Going over this again is slightly confusing. The block diagram clearly shows 2 shader engines. Each Shader Engine has 10 "Dual compute units".

    [​IMG]

    Now are these dual compute units the mixed wavefront thing they're doing? So 1 "Dual Compute Unit" = 2 32 thread (stream processor) or 1 64 SP unit. If so then the 5700 would need two of the block diagrams shown to match the given numbers.

    Or these dual compute units are 2 64 SP units, and the block diagram represents a complete 5700. I'm honestly not sure which one it is, the terminology doesn't match up here.
     
    #816 Frenetic Pony, Jun 11, 2019
    Last edited: Jun 11, 2019
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I'm confused about this slide about GCN:

    [​IMG]

    The idea about a 4 cycle issue was that you could use cycle N to read 64x operand 0, N+1 to read 64x operand 1, N+2 to read 64x operand 3, and N+3 to write the result.

    See also here.

    The benefit of this scheme is that you don't need a multi-port RAM for the register file, and that you never have issues with bank conflicts (AFAIK, that's the case for GCN.) Yet that's not what this slide shows: it shows 4 register banks, and fetching the same operands for different lanes. If you do that, you might as well not have the 4 cycle issue?

    Edit: OK, I'm a bit dumb... I'm right about the operands being fetched 64 at a time, but the SIMDs are being fed lane 0-15, then lane 16-31 etc. So it's as expected. The only thing still strange is that it shows 4 VGPRs as if there are 4 banks.

    The RDNA slide makes sense to me:
    [​IMG]

    4 VGPR banks which allow single cycle issue... as long as you don't have bank conflict. That part of the architecture seems to be closely related to the Maxwell/Pascal SM (but not the Volta or Turing one.)
     
    #817 silent_guy, Jun 11, 2019
    Last edited: Jun 11, 2019
    Lightman and AlBran like this.
  18. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    1,919
    Likes Received:
    1,069
    Any details on their new upscaling tech?
     
  19. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    166
    Likes Received:
    82
    Culling...

    [​IMG]
     
  20. del42sa

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    166
    Likes Received:
    82
    I see two SE, each with two cluster. One cluster with five CU´s.

    What really bothers me is a very little gain from process transition itself

    [​IMG]
     
    Lightman likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...