AMD: Navi Speculation, Rumours and Discussion [2017-2018]

Status
Not open for further replies.
AdoredTV just put out some rumors/leaks about what AMD will put out at Computex (June) 2019:

Three Navi GPUs:

  1. RX 3060 75watts performance equal to RX 580 8Gb - target price $130.

  2. RX 3070 120watts performance equal to Vega 56 8GB - target price $200.

  3. RX 3080 150watts performance better by 15% than Vega 64 - target price $250.


Argh, the cringe on these names..

This is what has been discussed since early today on this thread ...
 
Really? You're basing this on one chip that we don't actually have any real comparison point to. Vega20 has double the memory controllers and who knows what other changes AMD hasn't disclosed so far. Just because the shader counts etc are the same doesn't mean the chip is the same. As for density, everything doesn't scale identically, so before we know all the details on what changed on the inside you can't really make any judgements on how it is. Heck, do we even know how big the theoretical scaling would be from GloFo 14nm to TSMC 7nm?

Yes I do,why not ? If you have anything better to compare with, lets do it.

Anyway Vega 20 has only 6% more transistors over Vega10, so except more HBM channels , FP64 precision support and xGMI they are virtually the same GPUs...
 
Yes I do,why not ? If you have anything better to compare with, lets do it.

Anyway Vega 20 has only 6% more transistors over Vega10, so except more HBM channels , FP64 precision support and xGMI they are virtually the same GPUs...
I rather skip comparisons to judge how good/bad a company is at designing 7nm chips if there isn't a proper comparison point.
Since no-one will probably do direct shrinks or such, one should just wait 'till there's other 7nm GPUs and chips and see how their transistor densities and performance turn out compared to their older chips.
 
I rather skip comparisons to judge how good/bad a company is at designing 7nm chips if there isn't a proper comparison point.
Since no-one will probably do direct shrinks or such, one should just wait 'till there's other 7nm GPUs and chips and see how their transistor densities and performance turn out compared to their older chips.

It´s kind of obvious, that NAVI would be smaller than Vega20. The question is how much ? Main disadvantage persist , if it's still GCN based....
 
Yes I do,why not ? If you have anything better to compare with, lets do it.

Anyway Vega 20 has only 6% more transistors over Vega10, so except more HBM channels , FP64 precision support and xGMI they are virtually the same GPUs...

That assumes that they only added things to the chip with new transistors and didn't change/modify any existing stuff. For the FP64 support they likely had to modify the existing design beyond just adding more transistors.

Regards,
SB
 
It´s kind of obvious, that NAVI would be smaller than Vega20. The question is how much ? Main disadvantage persist , if it's still GCN based....
I'm not talking about die size, I said densities and performance.
Also considering that AMD said TSMC 7nm would offer them about around 25% (and up) better performance at same power, Vega 20 having 2 extra memory controllers and twice the memory pool which adds it's own extra power consumption - that 20% seems pretty spot on to what the process shrink should give it compared to Vega 10 on theoretical FLOPS since the base architecture is same with some added extras for HPC
 
NAVI10LITE : GFX1000 (aka Navi 12?)
NAVI10 : GFX1010

DtkfUEUVAAAQ90Q.jpg
 
I'm not talking about die size, I said densities and performance.
Also considering that AMD said TSMC 7nm would offer them about around 25% (and up) better performance at same power, Vega 20 having 2 extra memory controllers and twice the memory pool which adds it's own extra power consumption - that 20% seems pretty spot on to what the process shrink should give it compared to Vega 10 on theoretical FLOPS since the base architecture is same with some added extras for HPC

by performance, you mean clock ? Because I don´t really think AMD will go past 4096SP and I don´t think they will use more than 4 Shader Engine or more than 64 ROPs, so only way they can squeze some more performance is by having higher clocked GPU unless unicorn drivers with Primitive Shader path becomes a real deal..... GCN is awfull in regard of power consumption versus clock scaling. Vega20 with 300W TDP only prove this. Frankly I don´t see much headroom for clocking above Vega 20, but hey we can always dream, can´t we ?
 
by performance, you mean clock ? Because I don´t really think AMD will go past 4096SP and I don´t think they will use more than 4 Shader Engine or more than 64 ROPs, so only way they can squeze some more performance is by having higher clocked GPU unless unicorn drivers with Primitive Shader path becomes a real deal..... GCN is awfull in regard of power consumption versus clock scaling. Vega20 with 300W TDP only prove this. Frankly I don´t see much headroom for clocking above Vega 20, but hey we can always dream, can´t we ?
Yes, obviously. And they didn't talk about Vega specifically, but just what the TSMC process can offer over their current process. Around 25% (and up) more performance (clocks) at same power on same chip. Vega 20 isn't same chip, it has additional stuff on the memory side and changes inside the architecture which could increase consumption and it still manages to offer bit over 20% more performance (clocks) at same power.
 
the 2070 performance at $250 would make me jump for an upgrade over my vega 56 imo. I doubt those rumors are true however
 
One slide (second picture in the first set of slides in https://www.computerbase.de/2018-11/amd-radeon-instinct-mi60/) had an artist's rendition of MI25 and MI60.
Pixel counting is rough going by a picture of a projected presentation, but my google-fu is a bit weak on finding a direct reference. However, there's been a rough correlation in area in the pictographs versus actual die shots historically.
The core GPU area (CUs, L2, front ends, ROPs) for MI25 is about 75% of the area, whereas MI60's core GPU area appears to be a little over half of its die.
The additional IO and supporting fabric/controllers appear to have a much higher proportion of the die at 7nm.

The MI60's core GPU area appears to be about half that of the MI25, and the ratio of areas of the representations seem to be proportional to their announced die size differences (with healthy error bars).
If we assume that the core GPU area for both is the dominant contributor to their transistor counts, it seems like that part of the GPU scaled more in line with AMD's current density scaling claims.

As for what goes in that wide ring of non-GPU in MI60, there's things like the various controllers, HBCC, and the infinity fabric mesh. The HBCC and memory section is hefty, and it may be a major contributor to the big swaths of "nothing" in the MI60 drawing on the left and right between the HBM PHY. In Vega, the region AMD indicated was the fabric was a minor but visible strip of silicon along the bottom of the GPU section, below the ROPs. Area wise, that strip was maybe roughly half the area of the RBEs, with some unknown number of blocks on the side with the PCIe and other interfaces potentially part of it. MI60 has "dark" strips going all around it now, given that there is twice the memory transported overall and on both sides of the GPU. Then there's xGMI on one side which would have its own stops on the die at significant fractions of the GPU's bandwidth.
That's a mesh scaled to 1 TB bandwidth, and it's composed of wires and buffers in a region that may not have scaled that well. The likely IF blocks in Zen are a non-trivial contributor, if the block of rectangles in the center of the die correspond to the crossbar setup that supports DDR bandwidths 1-2 orders of magnitude lower than MI60.

Looking at Fiji, it doesn't have such an obvious section of the die devoted to its interconnect, so while the fabric can give many benefits, I think area isn't one of them.

edit: From 7:00 onwards in the following presentation, there's an exchange covering the not 2x area scaling where the statement was that not all areas of the chip had that higher density. This seems to be the case for the silicon all around the GPU core.
 
Could that be reason enough reason to decouple the "core GPU" from the I/O?

Since the I/O doesn't scale well, keep sourcing it from GF on 14nm, cheaply. Couple it with the core GPU's from expensive 7nm wafers. In principal, that seems like pretty good utilisation of established supply lines, especially when legally bound to GF purchases.

Everything I've read about chiplets has entirely revolved around a multi chiplet setup, but that's not necessarily what we'll get right off the bat. That being said, would 1 chiplet per GPU be beneficial in terms of manufacturing costs?
 
I don't think single die with io die makes sense. With 8 CPUs you have a single Infinity fabric with 100gb/s per chiplet. That's not too much and not too hard to make. I/O Die has 8 Infinity fabrics (800Gb/s) and Ram Interface (Should be around 200 Gb/s). So in total 1 Tb/s. But let's think about a vega class gpu, 256 Bit, 14-16 gbps, 448-512 Gb/s. So the IO die needs again 1 tb/s bandwidth, 500 Gb/s infinty fabric and 500 Gb/s Ram interface. Additionally the GPU also needs this 500Gb/s Infinity Fabric to communicate with the io die and get the full bandwidth. the infinity fabric for mcm communication is much smaller than a Ram interface, but still i don't think that you gained much.
 
Could that be reason enough reason to decouple the "core GPU" from the I/O?

Since the I/O doesn't scale well, keep sourcing it from GF on 14nm, cheaply. Couple it with the core GPU's from expensive 7nm wafers. In principal, that seems like pretty good utilisation of established supply lines, especially when legally bound to GF purchases.

Everything I've read about chiplets has entirely revolved around a multi chiplet setup, but that's not necessarily what we'll get right off the bat. That being said, would 1 chiplet per GPU be beneficial in terms of manufacturing costs?

The non-GPU area in this case includes the controllers and data fabric for DRAM channels with at least 1 TB/s bandwidth. Wherever the dividing line is between the chiplet and its support silicon, there's going to be a set of link controllers at the near and far end, and pad area on both sides. The area taken up by the PHY and controllers is part of what doesn't scale, and every extra off-die connection puts that area on the GPU die and IO die.
I don't have a good handle on how much of the Zen die is the inter-die link and controller, but I tried pixel counting a die shot and got something like ~.008 for one link, which may be 1-2mm2 each.
Bandwidth-wise, that link can support ~21-25 GB/s worth of memory bandwidth.
25 GB/s per link would need to scale to >1000 GB/s of GPU bandwidth, since it is 40 or so times too small.
Power-wise, the current link tech needs to improve significantly. IF is 2pJ/bit x 8 bits per byte x 2^40 bytes/s, so ~18 W in one direction. The internal paths usually have the same bandwidth in both directions, so 36W are lost just moving data between dies.
Then, there still needs to be a fabric of some kind on the GPU die to route the data, and that's also part of the area that didn't scale.

This is likely one reason why AMD's future plans usually have GPU chiplets as part of a stack. The HBM is above them and they sit on top of an active interposer.
For one thing, die to die stacking allows much tighter pad pitch, and much lower power per connection. Also, stacking allows the active interposer to host a good part of that on-die fabric that has to remain on the GPU in the proposed 2-die solution.
For designs that require more than one chiplet, the interposer also reduces the burden on the GPU because each new die in the memory network multiples the number of links and their area/power cost.

In effect, the IO die in Rome is something of an intermediate step to being an active interposer. It's hosting the memory and IO, and likely a decent mesh or some other network to link all the clients. There's a 1:1 link to each chiplet, which at least in terms of system topology is similar to what they'd look like if the chiplet were mounted on top of the IO die.

I feel that current and near future link technologies are not yet at the scale needed to satisfy the higher demands of a GPU, and there are greater challenges and questions regarding stacking and active interposers.
 
And there's the "much cheaper than Nvidia" that AMD needed to do, at least as a rumor.

If they're using 20 wavefronts per shader engine, and that Navi 10 die is two of these, that would've put the equivalent selling point of a Vega 40 at around $325 (compared to Vega 64). Since shrinking doesn't do anything for cost these days we can discount the change in manufacturing node. Meaning they must have cut 20%+ of the cost from Vega to Navi to sell Navi 10 at a profit.

We can assume a not insignificant part of that is ditching HBM. No interposer cost, no waiting on memory manufacturers gladly charging through the nose. But even then it still means AMD has managed to cut a portion of their silicon out per wavefront.

Estimating from the performance front, Vega 64 +15% that's an astonishing 84% increase in performance per wavefront that's being mooted. Even with taking full advantage of 7nm's projected 25% increase in clockspeed, you get a very surprising 47% increase in efficiency. That's just an increase in efficiency per wavefront, not counting the supposed smaller silicon footprint of each one.

So the cost reduction seems reasonable if AMD hasn't added more silicon for features at all. Or AMD has made one of the most astonishing generational advancements in GPU efficiency in ages. Or the rumor is false. I'm almost leaning towards the latter one, at least in part. The mooted combination of nicely lower cost and vastly higher efficiency at the same time seems too large, I'd not be surprised if this 3080 or whatever ends up costing $300 instead. But hell maybe I'll be surprised around CES, which is all of a month away.

Edited stuff due to bad first paragraph reference, Vega 64 is $500, not $600, dummy me. Caffeine is not a substitute for sleep kids!
 
Last edited:
It´s kind of obvious, that NAVI would be smaller than Vega20. The question is how much ? Main disadvantage persist , if it's still GCN based....

Navi is "nextgen" uArch, not CGN.



I believe that these "Radeon RX 3000 series", is not Vega20 (ie: v2.0), but Vega12 (ie:v1.2).

Therefore, if true, the RX 3k series is a reworked Vega 64 style uarch, shrunk down to 7nm. With the end-users receiving all the cost/performance/heat advantages of the new node process. Vega 64 +15% @ 150w is exactly the metrics that AMD showed in their 7nm slides of Vega20. I find this move by AMD entirely plausible and almost exactly what Dr Su has been telling us all along.

Additional thoughts on this rumor, is if..... the Vega1.(2) version has reworked Infinity Fabric, having multiple GPUs might be a thing again for gamers & streamers. Either way, hitting the sub-4k market with a sub $300 powerhouse is grabbing max mindshare.... for when Navi hits and gamers want more & move to 4k.


I believe Navi will be the high-end gaming GPU coming in about 9 months time. (They will double the 3k series gaming performance & then some.)
 
Status
Not open for further replies.
Back
Top