AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
NVIDIA didn't reach any higher with Samsung 14nm which is the same process as GloFo 14nm

Not quite true. The chips made on Samsung where the smallest ones which are typically the less dense ones (due to memory i/o most likely), but in Pascal they were denser than the most immediate chip.

GP108 - 24.3 MTrans/mm^2
GP107 - 25 MTrans/mm^2
GP106 - 22 MTrans/mm^2
GP104 - 22.9 MTrans/mm^2
GP102 - 25.4 MTrans/mm^2

Also it's not just Navi 10 & 14, Vega 20 also has 40-41 MTrans/mm^2 density

Yeah and with both AMD increased clocks a lot, likely requiring optimizations to be made. Nvidia did the same with Pascal and the contention here is that they'd not need to do it for Ampere.
 
Either way nV is yet to tape client stuff out.

Even if they are not taped out yet, Turing wasn't taped out till April 2018, according to our own @Erinyes, yet it was launched August the same year (in the span of just 4 months), so I don't know why you'd think NVIDIA will delay next gen till RDNA3 is released, that's just ridiculous.

Turing hasn't yet taped out and is expected back from the fab any day now apparently. Expected launch is late Q3. So the PC gaming market will at least have something for 2018. My Source indicated that next is 7nm Ampere due sometime in H1'19 and that 7nm Gaming GPUs will be delayed given initial 7nm wafer availability and costs.

 
So are you suggesting it's similar to AMD using old "14nm libraries" with GloFo 12nm? Then the increase in density should be slightly higher than the number I used of course. Regardless, at 12/14/16nm processes AMD and NVIDIA GPUs have had quite similar densities from 22 to 25 Mtrans/mm^2 or so
I'm saying that 12FFN is essentially 16FF+ in everything but name. The difference in transistor density between Pascal and Turing is non-existent.
 
Actually they are bringing at least that one more GCN in the form of Arcturus, but it's supposedly only for Radeon Instinct family

I wonder if they won't conceivably keep Vega for a little bit longer though. It's not as flexible and efficient as Navi, but it's a more efficient use of silicon by being less complex. For small, power constrained, iGPU situations it seems to be quite suitable. Though if that means they'll be kept in more consumer oriented products I'm less sure. But thin clients and the like wouldn't really need Navi, whereas the smaller silicon footprint and higher efficiency of the improved Vega cores would be a boon. I guess it comes down to what has the highest ROI for those parts in the future. But I'm on record as having a soft spot for Vega. Underdog sympathies would be my guess. So it might be wishful thinking on my part.
 
Compute performace per transistor is excellent, graphics performance per transistor and clock(!) is also quite good. The only problem is performance per watt (mostly in graphics, not so much for compute). It seems, that AMD (at least partially) solved the performance per watt issue with Renoir.
 
Is Vega less complex ? Because it seems like a lot of stuff not efficient at all (and I have one).

I'll be honest with you, I'm not technically expert enough to give a proper technical answer. But so far as I understand it RDNA improved on GCN's compute units drastically. The older CU's had to be targeted well to avoid bottlenecks, whereas GCN reworked them into a "workgroup" setup with two CU's each allowing for much more granular execution of code. GCN, Vega, would wait for instructions unless fed properly where RDNA, Navi, chugs along happily. There are also bandwidth improvements and scaled up SIMD units in the new workgroup setup that simply irons out most of the kinks in GCN. Making it more efficient. Thing is, if you were to code for GCN's peculiarities, you've got wasted silicon doing nothing on Navi. The CU's in GCN weren't bad so long as they were fed. Keeping them fed was the issue.

This apparently gave rise to the "fine-wine" moniker of AMD's products. As successive games were more adept at filling the GCN CU's. Performance looked like it didn't drop as much over time as it did with, say, nVidia's products of the same vintage. Whose products are more flexible by design already. Developers simply got better at targeting GCN hardware in subsequent games, whereas the competition was "maxed out" already thanks to the efficiency of its design. Drivers sort of did the rest for AMD.

This is why I gather that Arcturus will do well in the datacenter, where workloads will be tailored to the peculiarities of the CU's and make use of them properly. The bottleneck scenarios more commonly found in games simply don't occur to the same degree there. Or shouldn't. And in low power parts like set top boxes, thin clients, etc, you don't need super efficient workgroups. The less silicon intense Vega cores will do. They waste less silicon per die, and work just fine. Besides which GCN is a known piece of kit by this point, with stable drivers and good developer familiarity. And will be kept in mind for some time to come when coding.

Again, that's so far as I understand it.
 
Vega had severe issues scaling even to 64 CUs, let alone 160. And so far we don't really know how well RDNA will scale above it's current maximum of 20 WGPs. Chances are though that it will scale better than GCN ever did.
 
You mean in Graphics, right? Compute, as long as it was compute bound alright, did scale.
 
Exactly, so when in the same die size, you can have 3X amount of transistors (properly used that would mean 3x execution units), each drawing roughly 1/3 the power as before, what do you get? Exactly 3X performance at equal power consumption. And if you make a chip that is half the size? 1.5x performance at 0.5x power, or in other words 50% higher performance at 50% lower power.

Even when you increase the die size, you increase the power usage too. Nothing come free.
But as you've stated, when adding transistors to a shrunken die: "each drawing roughly 1/3 the power as before..". So the new tracks use more power than just for free. The cost if using extra transistors is now reduced by 65%, etc. What the others stated, about using 3x more die space, instead of 65%and/or..
 
Last edited:
Even when you increase the die size, you increase the power usage too. Nothing come free.

I never said it was free. Just that as long as the power reduction and density increase align, you can utilize that density increase. So 2x density @ 50% power OR 3x density @ 33% power, both work. On the other hand, 3x density @ 50% power wouldn't work as well.

EDIT: Basically I was saying that in order to increase performance by 50% you don't need to increase clocks by 50%, in fact you don't need to increase them at all if you can increase size by 50% or more (much more) at the same power envelope. And in fact, when you can increase size by up to 3x you even have the luxury of lowering clocks and still get a large performance upgrade while increasing efficiency massively.
 
Last edited:
Cut that back to 128CU, add some REAL GOOD SHIT and you've got yourself an Arcturus.

Renoir is only a little flex in their newlyfound circuit design prowess.
More to come!

It's definitely looking like it wasn't just PR that Raja was terrible at, and based on that initial Xe presentation it seems he's straight back to his old bad habits of vastly overpromising and pumping up tons of new, untried technologies in extremely optimistic timelines. Focusing on things like HBCC instead of good execution anyone?

Also, entirely expected note but Lisa Su has explicitly confirmed RDNA2 for this year

Been easy to assume, but nice to have it stated officially and without any wiggle room. The exact wording of the statement is weird though. Why would you "refresh" "Navi" and have a "next generation RDNA architecture" launch in the same year?
 
Last edited:
Status
Not open for further replies.
Back
Top