AMD: Navi Speculation, Rumours and Discussion [2017-2018]

Discussion in 'Architecture and Products' started by Jawed, Mar 23, 2016.

Tags:
Thread Status:
Not open for further replies.
  1. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    94
    There is, but as far as I know Samsung hasn't licensed it to Glo-Flo, nor has Glo-flo attempted their own improvements. They seem far more concentrated on 7nm.

    So they've stated multiple times, and that AMD should be among their first customers as they expect AMD to be the first to tape out final designs on it.

    That being said just porting Vega over to 7nm seems incredibly costly, especially if they also plan to do NAVI on 7nm. Engineering cost per chip design has gone up a hell of a lot, and yes you have to do that between nodes even with a completed design. Vega seems a failure in terms of its original goals, there doesn't seem to be any hint of split wavefronts that papers originally posited would be there, and doesn't seem to be any better than Polaris in terms of performance/watt. I was wondering why the apparent lead engineer for Vega64 (and Vega in general?) had his resume up way before the chips actual release. I'd hazard the guess that AMD was none too pleased with the sim results but had put in way too many resources to change course, so away he goes.

    Looping back around, if Navi shows up with some of the previously indicated features, a new memory controller, some vague new AI enhancements, and maybe that split wavefront packing that was supposed to be in Vega; then it'd make a lot more sens to spend time and money there, rather than rushing out a 7nm Vega. But hey, that's assuming it's on time at all. Maybe the timetables have changed drastically since last AMD talked about them, Vega itself has been delayed more than long enough.

    Still... Just doubling the HBM stacks and the DP rate without changing anything else? That's a very odd design decision to say the least. I'd say the specs at least are extremely sketchy, as is the conveniently super wide TDP range.
     
    #181 Frenetic Pony, Aug 28, 2017
    Last edited: Aug 28, 2017
  2. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Rumors from some of the semi conferences were that 7nm is troublesome. Wouldn't be surprised if timelines shifted a bit.

    Keep in mind Vega20 was a HPC chip with FP64. Even Hawaii is still around for that market, so 7nm Vega might make sense there. Even if expensive the market justifies it.

    The sim results would have occurred well before the chip was finalized and simulating architecture features not that difficult or expensive. If a new design wasn't working, they could have pushed a larger Polaris easily enough. Scorpio was already halfway there and on 14nm. Only exception the layout being awful, but Vega seems very reasonable at lower clocks. The real issue with Vega seems it was pushed too far and intended as a lower power mobile part. A Vega x2 with lower clocks and voltages would in all likelihood be a very reasonable part.
     
    Grall and ToTTenTranz like this.
  3. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    939
    Likes Received:
    398
    One Thing I Don't Understand. People say Vega were push too far in the frequency yet the Amd said that the massive difference in transistor were because of making Fiji able to run at higher frequency then how much of a difference in frequency between Vega with lower clocks and Fiji would be?
     
  4. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    94
    The 1/2 64bit rate might give them a reason for doing this, though it'd be the only one I could see. Certainly Nvidia is charging more than enough for Volta to try and get in there. But that doesn't mean a half rate double precision Navi card couldn't be made.

    And aside, Vega's transistor bloat versus Fiji seems largely from latency hiding and other circuits made specifically for pushing clockrate to something you'd expect from 14nm. The stated target for the overall architecture was 1700mghz, that Vega 64 pretty much never hits its lower clock of 1677mghz seems and indication of how badly things went. Also the rumor doesn't call for a doubling of Vega, which is what I would assume would happen on a new node. It calls for a doubling of HBM stacks and for the CUs to remain exactly the same, which is why I called it an odd decision.

    Vega's new rasterizer seems effective, at least as far as that design decision goes. Vega also switched to HBM2, so a lot more ram is available. It's also got double FP16 and INT8 support, among other things. Besides, switching to 7nm would mean a higher clockrate, or at least the same as 14nm at any rate. Doubling (theoretically) the transistor count is only part of the benefit of new nodes. The other is getting a massive drop in TDP, or a decent increase in clockspeed.
     
  5. ImSpartacus

    Regular Newcomer

    Joined:
    Jun 30, 2015
    Messages:
    252
    Likes Received:
    199
    GCN 5 still has the 4 shader engine (now "compute engine") limit according to Anandtech's discussions with AMD engineers (below). AMD probably decided it was easier to increase clocks than remove that limitation and scale higher than 4 compute engines.

    http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

    This is also probably why Vega 20 is rumored to only have 64 CUs, despite doubling the VRAM. Maybe AMD thinks the increase in clocks from 7nm will be enough to make a 4-stack Vega 20 into a balanced design?
     
  6. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    939
    Likes Received:
    398
    I was commenting in the fact that amd said most of the transistors were to increase clocks. At 1.7ghz sweets pot clocks I can understand but at what? 1.5/1.4ghz vs 1/1.1 of Fiji?
     
  7. ImSpartacus

    Regular Newcomer

    Joined:
    Jun 30, 2015
    Messages:
    252
    Likes Received:
    199
    I know exactly what you meant. I believe it was referenced in Anandtech's Vega 56/64 review (probably else where as well).

    http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

    If you asked me to speculate on how Vega 10 would clock without those extra clock-increasing enhancements (i.e. a smaller die), then I'd say it'd probably clock like Polaris (GCN 4), as it's the most recent GCN and the only other GCN on 14nm. So 1.2-1.4 GHz is where I'd expect it to end up.

    But obviously, AMD didn't want to make a smaller die that only clocked at 1.2-1.4 GHz. Therefore, I think the more interesting question is how do you best use the 484mm2 die space?

    • Do you make enhancements to clock 64CUs to ~1.7 GHz?
    • Or do you add compute resources so you have, say, 80CUs at 1.2-1.4 GHz?

    That's an interesting decision to make until you remember that the 4 shader engine limit apparently requires a meaningful engineering effort to overcome. Then you see why AMD went for the higher clocks.

    That's why I thought the shader engine limit was relevant to your question.
     
    Picao84 and Grall like this.
  8. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,436
    Likes Received:
    264
    That's a gross underestimation. I doubt any chip designer would call these things easy or cheap. Some features can surely be simulated earlier than others and thus are cheaper, but this doesn't mean the simulations are cheap or easy.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    I think that idea comes from AMD's more recent presentation in May that listed Vega as 14nm and 14nm+. Perhaps that slide doesn't cover compute, or something has reshuffled which GPUs get the newer I/O and hardware features.
    A Vega with DP and perhaps a subset of features like ECC might be doable on 14nm, and if we believe the leaks concerning Greenland's role in AMD's HPC, it may have been possible at one point to have done it by now.

    Perhaps it was a watermark of a dong?
     
    Alexko likes this.
  10. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Perhaps, but many smaller chips may not be as ideal for all HPC tasks. Also a concern with 4 stacks of HBM2 and I'd assume extra PCIE lanes for Infinity with a x2, Epyc APU, or onboard devices such as SSG or SAN controllers. Features that don't make much sense on consumer parts.

    That latency hiding seems to be all the SRAM, although we still don't have a full accounting of it. That would provide very dense transistors. So unless they forgot to isolate the power supply for all of it, the effect shouldn't be that drastic. Tests do indicate a little undervolting drastically reduces power with minimal performance impact.

    Meaningful just means worth bothering with, not that it's particularly difficult. Take that partitioning instruction in the ISA for example. Four SEs or four quads means dividing each dimension in two. Dividing by two tends to be rather efficient in digital systems. Far easier than by 3 or having multiple partitions in one dimension. If they fixed it we'd likely see 16 SEs and equalizing the amount of work in bins far more difficult. The quads work well unless all the work falls into opposing quadrants.

    Don't get me wrong, there is definitely work involved. However, simulating experimental features at a high level is far easier and cheaper than laying out an entire chip, fabricating, then discovering something fundamental doesn't work. Especially with 14nm being reasonably well understood at this point. So HBCC, primitive shaders, and caching models would have been reasonably well tested prior to simulating an entire chip. These are features that aren't fully enabled, nor is performance terrible at slightly lower voltages and clocks. For whatever reason leakage seems really bad at stock values and gets worse very quickly. It seems too large an error to have mistaken the power curve with the higher clocks for which Vega was designed. The results speak more to the effect some part of the design unexpectedly consumed far more power than expected.

    I'm still of the mindset Vega is a TBDR design that isn't currently working and the gaming parts would target significantly lower clockspeeds with the resulting work reduction. That or all remotely good samples are consumed by pro parts.
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,805
    Likes Received:
    2,067
    Location:
    Germany
    Maybe that is the reason that, with regard to die size, Vega 10 is where it is today. To have some wiggle room for a Vega 20 in 14 nm with half-rate DP, ECC protected SRAM and maybe even twice the memory controllers. But then, doing this only to be able to reuse as many makros as possible would make one of the primary uses of IF redundant. *shrugs*
     
  12. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,480
    Likes Received:
    432
    Location:
    Somewhere over the ocean
    Infinity related
    Navi will be PCIe 4?
     
  13. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Except that AMD featured their spanking new shiny binning rasterizer on multiple (sets of) slides. It would be pretty weird, if not downright disingenuous (possibly illegal) to heavily advertise a feature which isn't working and won't/cannot be enabled.
     
    Gubbi likes this.
  14. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    The other information in the slides turned out to be accurate. Even the name Vega 20 was first learned about there, and recently we got confirmation that the name is real. While plans do change, we can safely assume that at one point this was AMD's plan.
     
    T1beriu and iMacmatician like this.
  15. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    As long as it works under certain conditions this is not problem.
     
  16. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,301
    Likes Received:
    256
    "Tiled rasterisation" and "tile-based deferred rendering" are not synonyms. Vega does tiled rasterisation, but (at least with current drivers) doesn't behave as a tile-based deferred gpu.
     
    CarstenS likes this.
  17. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Probably PCIE4 for Navi, as Vega already has Infinity confirmed, I see no reason Navi wouldn't continue that trend. Vega only uses it to connect the memory controller, cores, video encode/decide, etc. It doesn't currently handle communication between CUs based on various statements. Makes more sense for the APUs where the memory controller is for system memory. It would also likely be backwards compatible, if for some reason PCIE3/4 devices were connected. PCIe4 primarily being a new speed spec as a foundation for future versions with 5+ bringing the interesting signal changes. Epyc in theory could run most PCIe lanes as IF with many adapters, but we haven't seen this to the best of my knowledge.

    Tiled and TBDR are different degrees of binning. Within a draw call or across many draws/states to establish the bin. One is far easier to implement and less costly from a hardware perspective. Tiled rasterizes when a buffer fills, deferred ideally rasterizes after all draws are committed. In theory pixels are only rendered once with zero overdraw, excluding transparency obviously. Deferred also requires keeping track of all the state and bound resources in addition to binning metadata. While AMD has only advertised DSBR, the TBDR method certainly seems possible.

    I'd agree it seems odd to advertise the feature if it wouldn't work, but we can only speculate why it's not currently enabled. In the case of the Energy benchmark with internal drivers it is working.
     
  18. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    94
  19. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    The old roadmap had only the compute focused Vega on 7nm. No plans I recall of a 7nm gaming variant. Vega being a new architecture, the difference may be minimal beyond MCMs like Ryzen. That might leave Vega more ideal than Navi for HPC and FP64. Deep learning should scale reasonably with a MCM considering the TPU designs as accelerators.
     
  20. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,499
    Likes Received:
    919
    That roadmap was a ROCm roadmap, so there was no reason for it to mention anything about gaming.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...