AMD: Navi Speculation, Rumours and Discussion [2017-2018]

Status
Not open for further replies.
There still isn't "14nm+" process and we have no idea how many chances Navi will be bringing
There is, but as far as I know Samsung hasn't licensed it to Glo-Flo, nor has Glo-flo attempted their own improvements. They seem far more concentrated on 7nm.

Well, GloFo does claim risk production H1/18 ramping to mass production H2/18
So they've stated multiple times, and that AMD should be among their first customers as they expect AMD to be the first to tape out final designs on it.

That being said just porting Vega over to 7nm seems incredibly costly, especially if they also plan to do NAVI on 7nm. Engineering cost per chip design has gone up a hell of a lot, and yes you have to do that between nodes even with a completed design. Vega seems a failure in terms of its original goals, there doesn't seem to be any hint of split wavefronts that papers originally posited would be there, and doesn't seem to be any better than Polaris in terms of performance/watt. I was wondering why the apparent lead engineer for Vega64 (and Vega in general?) had his resume up way before the chips actual release. I'd hazard the guess that AMD was none too pleased with the sim results but had put in way too many resources to change course, so away he goes.

Looping back around, if Navi shows up with some of the previously indicated features, a new memory controller, some vague new AI enhancements, and maybe that split wavefront packing that was supposed to be in Vega; then it'd make a lot more sens to spend time and money there, rather than rushing out a 7nm Vega. But hey, that's assuming it's on time at all. Maybe the timetables have changed drastically since last AMD talked about them, Vega itself has been delayed more than long enough.

Still... Just doubling the HBM stacks and the DP rate without changing anything else? That's a very odd design decision to say the least. I'd say the specs at least are extremely sketchy, as is the conveniently super wide TDP range.
 
Last edited:
Rumors from some of the semi conferences were that 7nm is troublesome. Wouldn't be surprised if timelines shifted a bit.

That being said just porting Vega over to 7nm seems incredibly costly, especially if they also plan to do NAVI on 7nm.
Keep in mind Vega20 was a HPC chip with FP64. Even Hawaii is still around for that market, so 7nm Vega might make sense there. Even if expensive the market justifies it.

I'd hazard the guess that AMD was none too pleased with the sim results but had put in way too many resources to change course, so away he goes.
The sim results would have occurred well before the chip was finalized and simulating architecture features not that difficult or expensive. If a new design wasn't working, they could have pushed a larger Polaris easily enough. Scorpio was already halfway there and on 14nm. Only exception the layout being awful, but Vega seems very reasonable at lower clocks. The real issue with Vega seems it was pushed too far and intended as a lower power mobile part. A Vega x2 with lower clocks and voltages would in all likelihood be a very reasonable part.
 
One Thing I Don't Understand. People say Vega were push too far in the frequency yet the Amd said that the massive difference in transistor were because of making Fiji able to run at higher frequency then how much of a difference in frequency between Vega with lower clocks and Fiji would be?
 
Rumors from some of the semi conferences were that 7nm is troublesome. Wouldn't be surprised if timelines shifted a bit.

Keep in mind Vega20 was a HPC chip with FP64. Even Hawaii is still around for that market, so 7nm Vega might make sense there. Even if expensive the market justifies it.

The sim results would have occurred well before the chip was finalized and simulating architecture features not that difficult or expensive. If a new design wasn't working, they could have pushed a larger Polaris easily enough. Scorpio was already halfway there and on 14nm. Only exception the layout being awful, but Vega seems very reasonable at lower clocks. The real issue with Vega seems it was pushed too far and intended as a lower power mobile part. A Vega x2 with lower clocks and voltages would in all likelihood be a very reasonable part.

The 1/2 64bit rate might give them a reason for doing this, though it'd be the only one I could see. Certainly Nvidia is charging more than enough for Volta to try and get in there. But that doesn't mean a half rate double precision Navi card couldn't be made.

And aside, Vega's transistor bloat versus Fiji seems largely from latency hiding and other circuits made specifically for pushing clockrate to something you'd expect from 14nm. The stated target for the overall architecture was 1700mghz, that Vega 64 pretty much never hits its lower clock of 1677mghz seems and indication of how badly things went. Also the rumor doesn't call for a doubling of Vega, which is what I would assume would happen on a new node. It calls for a doubling of HBM stacks and for the CUs to remain exactly the same, which is why I called it an odd decision.

One Thing I Don't Understand. People say Vega were push too far in the frequency yet the Amd said that the massive difference in transistor were because of making Fiji able to run at higher frequency then how much of a difference in frequency between Vega with lower clocks and Fiji would be?

Vega's new rasterizer seems effective, at least as far as that design decision goes. Vega also switched to HBM2, so a lot more ram is available. It's also got double FP16 and INT8 support, among other things. Besides, switching to 7nm would mean a higher clockrate, or at least the same as 14nm at any rate. Doubling (theoretically) the transistor count is only part of the benefit of new nodes. The other is getting a massive drop in TDP, or a decent increase in clockspeed.
 
One Thing I Don't Understand. People say Vega were push too far in the frequency yet the Amd said that the massive difference in transistor were because of making Fiji able to run at higher frequency then how much of a difference in frequency between Vega with lower clocks and Fiji would be?

GCN 5 still has the 4 shader engine (now "compute engine") limit according to Anandtech's discussions with AMD engineers (below). AMD probably decided it was easier to increase clocks than remove that limitation and scale higher than 4 compute engines.

At a high level, Vega 10’s compute core is configured almost exactly like Fiji. This means we’re looking at 64 CUs spread out over 4 shader engines. Or as AMD is now calling them, compute engines. Each compute engine in turn is further allocated a portion of Vega 10’s graphics resources, amounting to one geometry engine and rasterizer bundle at the front end, and 16 ROPs (or rather 4 actual ROP units with a 4 pix/clock throughput rate) at the back end. Not assigned to any compute engine, but closely aligned with the compute engines is the command processor frontend, which like Fiji before it, is a single command processor paired with 4 ACEs and another 2 Hardware Schedulers.

On a brief aside, the number of compute engines has been an unexpectedly interesting point of discussion over the years. Back in 2013 we learned that the then-current iteration of GCN had a maximum compute engine count of 4, which AMD has stuck to ever since, including the new Vega 10. Which in turn has fostered discussions about scalability in AMD’s designs, and compute/texture-to-ROP ratios.

Talking to AMD’s engineers about the matter, they haven’t taken any steps with Vega to change this. They have made it clear that 4 compute engines is not a fundamental limitation – they know how to build a design with more engines – however to do so would require additional work. In other words, the usual engineering trade-offs apply, with AMD’s engineers focusing on addressing things like HBCC and rasterization as opposed to doing the replumbing necessary for additional compute engines in Vega 10.

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

This is also probably why Vega 20 is rumored to only have 64 CUs, despite doubling the VRAM. Maybe AMD thinks the increase in clocks from 7nm will be enough to make a 4-stack Vega 20 into a balanced design?
 
I was commenting in the fact that amd said most of the transistors were to increase clocks. At 1.7ghz sweets pot clocks I can understand but at what? 1.5/1.4ghz vs 1/1.1 of Fiji?
 
I was commenting in the fact that amd said most of the transistors were to increase clocks. At 1.7ghz sweets pot clocks I can understand but at what? 1.5/1.4ghz vs 1/1.1 of Fiji?

I know exactly what you meant. I believe it was referenced in Anandtech's Vega 56/64 review (probably else where as well).

Talking to AMD’s engineers, what especially surprised me is where the bulk of those transistors went; the single largest consumer of the additional 3.9B transistors was spent on designing the chip to clock much higher than Fiji. Vega 10 can reach 1.7GHz, whereas Fiji couldn’t do much more than 1.05GHz. Additional transistors are needed to add pipeline stages at various points or build in latency hiding mechanisms, as electrons can only move so far on a single (ever shortening) clock cycle; this is something we’ve seen in NVIDIA’s Pascal, not to mention countless CPU designs. Still, what it means is that those 3.9B transistors are serving a very important performance purpose: allowing AMD to clock the card high enough to see significant performance gains over Fiji.

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/2

If you asked me to speculate on how Vega 10 would clock without those extra clock-increasing enhancements (i.e. a smaller die), then I'd say it'd probably clock like Polaris (GCN 4), as it's the most recent GCN and the only other GCN on 14nm. So 1.2-1.4 GHz is where I'd expect it to end up.

But obviously, AMD didn't want to make a smaller die that only clocked at 1.2-1.4 GHz. Therefore, I think the more interesting question is how do you best use the 484mm2 die space?

  • Do you make enhancements to clock 64CUs to ~1.7 GHz?
  • Or do you add compute resources so you have, say, 80CUs at 1.2-1.4 GHz?

That's an interesting decision to make until you remember that the 4 shader engine limit apparently requires a meaningful engineering effort to overcome. Then you see why AMD went for the higher clocks.

That's why I thought the shader engine limit was relevant to your question.
 
The sim results would have occurred well before the chip was finalized and simulating architecture features not that difficult or expensive.
That's a gross underestimation. I doubt any chip designer would call these things easy or cheap. Some features can surely be simulated earlier than others and thus are cheaper, but this doesn't mean the simulations are cheap or easy.
 
According to these leaks that so far have been pretty accurate so far, Vega 20 is coming next year at 7nm:


The roadmap doesn't show it as a gaming card, though. It's a direct replacement to Hawaii for HPC DP compute. The 4 stacks might be there mostly to reach the 32GB total HBC, which seems to be pretty relevant for DP compute.

I think that idea comes from AMD's more recent presentation in May that listed Vega as 14nm and 14nm+. Perhaps that slide doesn't cover compute, or something has reshuffled which GPUs get the newer I/O and hardware features.
A Vega with DP and perhaps a subset of features like ECC might be doable on 14nm, and if we believe the leaks concerning Greenland's role in AMD's HPC, it may have been possible at one point to have done it by now.

Supposedly watermark
Perhaps it was a watermark of a dong?
 
The 1/2 64bit rate might give them a reason for doing this, though it'd be the only one I could see. Certainly Nvidia is charging more than enough for Volta to try and get in there. But that doesn't mean a half rate double precision Navi card couldn't be made.
Perhaps, but many smaller chips may not be as ideal for all HPC tasks. Also a concern with 4 stacks of HBM2 and I'd assume extra PCIE lanes for Infinity with a x2, Epyc APU, or onboard devices such as SSG or SAN controllers. Features that don't make much sense on consumer parts.

And aside, Vega's transistor bloat versus Fiji seems largely from latency hiding and other circuits made specifically for pushing clockrate to something you'd expect from 14nm.
That latency hiding seems to be all the SRAM, although we still don't have a full accounting of it. That would provide very dense transistors. So unless they forgot to isolate the power supply for all of it, the effect shouldn't be that drastic. Tests do indicate a little undervolting drastically reduces power with minimal performance impact.

That's an interesting decision to make until you remember that the 4 shader engine limit apparently requires a meaningful engineering effort to overcome. Then you see why AMD went for the higher clocks.
Meaningful just means worth bothering with, not that it's particularly difficult. Take that partitioning instruction in the ISA for example. Four SEs or four quads means dividing each dimension in two. Dividing by two tends to be rather efficient in digital systems. Far easier than by 3 or having multiple partitions in one dimension. If they fixed it we'd likely see 16 SEs and equalizing the amount of work in bins far more difficult. The quads work well unless all the work falls into opposing quadrants.

That's a gross underestimation. I doubt any chip designer would call these things easy or cheap. Some features can surely be simulated earlier than others and thus are cheaper, but this doesn't mean the simulations are cheap or easy.
Don't get me wrong, there is definitely work involved. However, simulating experimental features at a high level is far easier and cheaper than laying out an entire chip, fabricating, then discovering something fundamental doesn't work. Especially with 14nm being reasonably well understood at this point. So HBCC, primitive shaders, and caching models would have been reasonably well tested prior to simulating an entire chip. These are features that aren't fully enabled, nor is performance terrible at slightly lower voltages and clocks. For whatever reason leakage seems really bad at stock values and gets worse very quickly. It seems too large an error to have mistaken the power curve with the higher clocks for which Vega was designed. The results speak more to the effect some part of the design unexpectedly consumed far more power than expected.

I'm still of the mindset Vega is a TBDR design that isn't currently working and the gaming parts would target significantly lower clockspeeds with the resulting work reduction. That or all remotely good samples are consumed by pro parts.
 
A Vega with DP and perhaps a subset of features like ECC might be doable on 14nm, and if we believe the leaks concerning Greenland's role in AMD's HPC, it may have been possible at one point to have done it by now.
Maybe that is the reason that, with regard to die size, Vega 10 is where it is today. To have some wiggle room for a Vega 20 in 14 nm with half-rate DP, ECC protected SRAM and maybe even twice the memory controllers. But then, doing this only to be able to reuse as many makros as possible would make one of the primary uses of IF redundant. *shrugs*
 
I'm still of the mindset Vega is a TBDR design that isn't currently working and the gaming parts would target significantly lower clockspeeds with the resulting work reduction.
Except that AMD featured their spanking new shiny binning rasterizer on multiple (sets of) slides. It would be pretty weird, if not downright disingenuous (possibly illegal) to heavily advertise a feature which isn't working and won't/cannot be enabled.
 
There is, but as far as I know Samsung hasn't licensed it to Glo-Flo, nor has Glo-flo attempted their own improvements. They seem far more concentrated on 7nm.


So they've stated multiple times, and that AMD should be among their first customers as they expect AMD to be the first to tape out final designs on it.

That being said just porting Vega over to 7nm seems incredibly costly, especially if they also plan to do NAVI on 7nm. Engineering cost per chip design has gone up a hell of a lot, and yes you have to do that between nodes even with a completed design. Vega seems a failure in terms of its original goals, there doesn't seem to be any hint of split wavefronts that papers originally posited would be there, and doesn't seem to be any better than Polaris in terms of performance/watt. I was wondering why the apparent lead engineer for Vega64 (and Vega in general?) had his resume up way before the chips actual release. I'd hazard the guess that AMD was none too pleased with the sim results but had put in way too many resources to change course, so away he goes.

Looping back around, if Navi shows up with some of the previously indicated features, a new memory controller, some vague new AI enhancements, and maybe that split wavefront packing that was supposed to be in Vega; then it'd make a lot more sens to spend time and money there, rather than rushing out a 7nm Vega. But hey, that's assuming it's on time at all. Maybe the timetables have changed drastically since last AMD talked about them, Vega itself has been delayed more than long enough.

Still... Just doubling the HBM stacks and the DP rate without changing anything else? That's a very odd design decision to say the least. I'd say the specs at least are extremely sketchy, as is the conveniently super wide TDP range.
The other information in the slides turned out to be accurate. Even the name Vega 20 was first learned about there, and recently we got confirmation that the name is real. While plans do change, we can safely assume that at one point this was AMD's plan.
 
Except that AMD featured their spanking new shiny binning rasterizer on multiple (sets of) slides. It would be pretty weird, if not downright disingenuous (possibly illegal) to heavily advertise a feature which isn't working and won't/cannot be enabled.
"Tiled rasterisation" and "tile-based deferred rendering" are not synonyms. Vega does tiled rasterisation, but (at least with current drivers) doesn't behave as a tile-based deferred gpu.
 
Infinity related
Navi will be PCIe 4?
Probably PCIE4 for Navi, as Vega already has Infinity confirmed, I see no reason Navi wouldn't continue that trend. Vega only uses it to connect the memory controller, cores, video encode/decide, etc. It doesn't currently handle communication between CUs based on various statements. Makes more sense for the APUs where the memory controller is for system memory. It would also likely be backwards compatible, if for some reason PCIE3/4 devices were connected. PCIe4 primarily being a new speed spec as a foundation for future versions with 5+ bringing the interesting signal changes. Epyc in theory could run most PCIe lanes as IF with many adapters, but we haven't seen this to the best of my knowledge.

Except that AMD featured their spanking new shiny binning rasterizer on multiple (sets of) slides. It would be pretty weird, if not downright disingenuous (possibly illegal) to heavily advertise a feature which isn't working and won't/cannot be enabled.
Tiled and TBDR are different degrees of binning. Within a draw call or across many draws/states to establish the bin. One is far easier to implement and less costly from a hardware perspective. Tiled rasterizes when a buffer fills, deferred ideally rasterizes after all draws are committed. In theory pixels are only rendered once with zero overdraw, excluding transparency obviously. Deferred also requires keeping track of all the state and bound resources in addition to binning metadata. While AMD has only advertised DSBR, the TBDR method certainly seems possible.

I'd agree it seems odd to advertise the feature if it wouldn't work, but we can only speculate why it's not currently enabled. In the case of the Energy benchmark with internal drivers it is working.
 
Slides from this years Financial Analyst Day, only a few months ago, confirm Navi as 7nm with no mention of Vega on that node: http://wccftech.com/amd-confirms-7nm-products-will-tape-year-zen-2-navi/

Also confirms tapeout this year with targeted release of sometime fiscal year next.
The old roadmap had only the compute focused Vega on 7nm. No plans I recall of a 7nm gaming variant. Vega being a new architecture, the difference may be minimal beyond MCMs like Ryzen. That might leave Vega more ideal than Navi for HPC and FP64. Deep learning should scale reasonably with a MCM considering the TPU designs as accelerators.
 
The old roadmap had only the compute focused Vega on 7nm. No plans I recall of a 7nm gaming variant. Vega being a new architecture, the difference may be minimal beyond MCMs like Ryzen. That might leave Vega more ideal than Navi for HPC and FP64. Deep learning should scale reasonably with a MCM considering the TPU designs as accelerators.

That roadmap was a ROCm roadmap, so there was no reason for it to mention anything about gaming.
 
Status
Not open for further replies.
Back
Top