AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Which ones?

Are you perhaps mixing that with the fact that AMD dual-sourced their Vega HBM2 stacks from SK Hynix and Samsung?
I'm not really sure which ones, but it was way before Vega or HBM2 so it should be Polaris 10 or 11
https://www.extremetech.com/computi...samsung-could-tap-foundry-for-future-products
Moorhead questioned AMD on the deal and received the following quote:

AMD has strong foundry partnerships and our primary manufacturing partners are GLOBALFOUNDRIES and TSMC. We have run some product at Samsung and we have the option of enabling production with Samsung if needed as part of the strategic collaboration agreement they have with GLOBALFOUNDRIES to deliver 14nm FinFET process technology capacity.
 
My take on that quote (from mid-2016 BTW) is they tested some waffers (have run some product - past tense) at Samsung to check compatibility with their GF designs, but there's really no proof they ever started full scale production for any ASIC.

I recon if they did start producing chips at Samsung that would require them to make a deal similar to what they did to produce the console chips at TSMC.
 
My take on that quote (from mid-2016 BTW) is they tested some waffers (have run some product - past tense) at Samsung to check compatibility with their GF designs, but there's really no proof they ever started full scale production for any ASIC.

I recon if they did start producing chips at Samsung that would require them to make a deal similar to what they did to produce the console chips at TSMC.
The quote also states clearly that they do have the option to enable production at Samsung "at will" due GloFo/Samsung deal - and even if that wasn't true, I would really like to see something more tangible than "some say it's this and that" to conclude that Samsung 14LPP coming from GloFo factory is in fact worse than one coming from Samsung factory
 
GF's 14LPP and Samsung's 14LPP are one and same thing, how could GF "lag further behind"?
The wording used for GF's licensing from Samsung was "copy-smart", rather than "copy-exact".
https://www.extremetech.com/computi...uddy-up-for-14nm-while-ibm-heads-for-the-exit
The Anandtech interview also states that GF has since extended the tech, although that may have implications for the so-called fab-synched chips they promised in 2014.

GF didn't junk all the tools they had built up with their aborted attempt at 14nm, and it would have delayed the transition further if they had tried to remove all of the mismatched hardware and re-order the machinery the Samsung process had been developed for. The delay alone would have kept them at an inferior point in the maturation curve, with Samsung's serving as a somewhat rough second source for the Apple A9 in late 2015, versus GF's sampling 14nm chips at around that time.

What that means now that both have had more time to get to the more level parts of the curve would require useful chips for comparison.
 
I have an Ryzen 2200g on the way to me, and it got me thinking why AMD haven't put Vega 11 out as a discrete card?

Looking at a Ryzen 2400g die shot, a rough estimate would put Vega 11 at around 80 - 90mm2, add another 10mm2?? For a 64bit ddr5 bus and it should be smaller than an Rx 560 and be more feature rich.
GIve it a 128 bit bus and it should get close to Rx 460 performance, especially at 1500-1600mhz and most likely come under 75w power consumption, so could be slot powered.
 
I have an Ryzen 2200g on the way to me, and it got me thinking why AMD haven't put Vega 11 out as a discrete card?

Looking at a Ryzen 2400g die shot, a rough estimate would put Vega 11 at around 80 - 90mm2, add another 10mm2?? For a 64bit ddr5 bus and it should be smaller than an Rx 560 and be more feature rich.
GIve it a 128 bit bus and it should get close to Rx 460 performance, especially at 1500-1600mhz and most likely come under 75w power consumption, so could be slot powered.

I doubt a discrete Vega 11 would be less than 100mm^2. You'd still need the memory PHYs, video transcoding blocks, etc. probably leading to ~130mm^2 or more, and that's a Polaris 11 / RX560.
And since the RX560 is selling well enough, there's little reason to spend engineering efforts to replace it.
 
Under 100m2 would be pushing it, but it shouldn't be too far off is this diagram is reasonably accurate - assuming the raven ridge is actually 210mm2 as most sources state:

KADdEySuvPVyD3jD.jpg

At say 110mm2, it would be very close to the Rx 560, but saying that, AMD saw fit to release the Rx 550 which according to most is 103mm2.

I guess it's as you say, why bother when RX560 is selling so well.
 
Weird that infinity fabric would be one huge concentrated lump of logic. You'd think it would be more spread out, in a more...fabric-like fashion, yeah?
 
I would think it's a labeling error. That region looks to be the southbridge. The fabric and actual northbridge might be partly in the region labelled as the multimedia engine between the GPU, CCX, and the memory PHY.
 
Well it's a patent it would seem, and patents often lack real-world implementations.

Indeed, driver implementation of primitive shader was abandoned; we know that already because AMD said so themselves.

So for anyone who isn't AMD or another GPU designing firm, it's more of a curiosity than anything else really.
 
The filing times are an interesting example of how variable the tea leaves are for gauging timing. The likely precursor patents for Vega's rasterizer are significantly older, for example. In the CPU realm, there's some recently published patents concerning store to load forwarding with a memfile that align with recent optimization guidance concerning keeping loads and stores occurring in a region close together in the code, and trying to keep them sharing the same base and index registers without modifications. This allows a form of value prediction using just a subset of the offset address to serve as a path for using rename registers and the renamer to forward a store rather than the load/store unit.

The primitive shader patent does seem to align with the aspirational long-term goals of the Vega whitepaper. However, that leaves questions as to whether this patent represents a long-term direction, the full use of the NGG path that Vega had in parallel with its standard path, a non-buggy version of it, or possibly a case where Vega's culling-only offering represents how much they could fit in amongst the gaps in the standard hardware.

Perhaps it's also a cautionary tale for those that get hyped about patents. Sometimes even nifty ones can turn out to be meh.


Long ramble ahead:

There are other hints that are intriguing, though some raise further questions.
There is a claim concerning an opcode for accelerating screen space partition coverage checks, which the Vega ISA has specifically for 4-engine GPUs.
Vega's supposed fully working path and the patent still have the rasterizer as a final culling point, although there is a vast gap in capability between the two primitive shader concepts.
The whitepaper hints at an architectural ability to shift from using the parameter cache to the L2, in a potentially inflexible way. This raises questions like whether Vega could literally have two paths in parallel, or are parts of them overlapping and repurposed like the path that chooses between the cache types. Could the inability to produce a working implementation, with the compiler-driven model exactly what AMD initially promised, for even the first step at only culling stem from bugs or the difficulties in getting an effective implementation with an architecture that might be awkwardly straddling both sides?
There's discussion in the patent about assigning ordering identifiers, and the ISA has a POPS counter mode.

Perhaps one notable change that's hard to tease out externally is the fate of the primitive setup to rasterizer crossbar that the patent claims poses a practical ceiling exactly where GCN has maxed out its shader engines.
Vega, if it implemented the patent, shouldn't need it but is structured like it does. Could it be that the GPU has it anyway for backwards compatibility, or could Vega have emulated the old functionality with the new? There are hints of new culling methods in the Vega ISA that might align with the compaction steps, and the opcode for partition is part of the export process. If it was emulating the old way with new, what would it mean that no one has noticed?
It would seem fully embracing this would give "scalability" for those looking ahead to future GPUs.

The local data store is an odd duck. It acts at some times like the LDS for parts of the process, but there's some cross-unit or global usage that doesn't mesh. This again calls back to that blurb about opting to use the new larger parameter cache or just going for stream out. If it's the memory hierarchy, in some ways I'd question if it's a fair characterization that the patent has abandoned crossbars, since the L2 itself uses a crossbar--which has been under load before and getting more use now that the ROPs use it too (and people are stuck wondering about why the memory subsystem seems like it just can't leverage memory bandwidth as well as they thought it should).


The combined shader stage patent aligns with the GFX9 driver changes for just that type of merge. I'm hazy on whether those are even optional for GFX9.
The touted efficiencies, more compact allocation, fewer intra-stage barriers, and less tracking overhead haven't shown up in comparisons with GFX8, though.
It makes the individual shaders more complex, and there's a bit of driver intervention for adding code to handle a mismatch in vertex wavefront count to later geometry shader ones in order to put excess geometry shader ones to sleep while copying vertex output from one combined shader to another.

The shifting granularities and masking point to a place where it'd be nice if the hardware could handle divergence better, as some other patents that might not be used have brought up. The tradeoffs for both patents on programmable hardware that has only mildly evolved from an era that predates these new use cases might be more complicated versus a GPU that committed fully to the newer concepts.
 
Indeed, driver implementation of primitive shader was abandoned; we know that already because AMD said so themselves.
Might not be abandoned, but just suspended if personnel we're needed elsewhere.

Using the more portable async compute culling should give most of the benefits of a custom primitive shader with a known interface for devs. It may exceed the automatic mode in performance, but not be quite as efficient as a custom primitive shader. None of that precludes the automatic mode showing up eventually.

Perhaps one notable change that's hard to tease out externally is the fate of the primitive setup to rasterizer crossbar that the patent claims poses a practical ceiling exactly where GCN has maxed out its shader engines.
It's possible the crossbar exists to ensure screen space and CU locality. The load wouldn't need balanced with async compute filling gaps. That would also address potential issues with extending the process to multiple chips. Something we haven't bseen yet with mGPU, which oddly Vega should vbe well suited for with HBCC and this arrangement.
 
Might not be abandoned, but just suspended if personnel we're needed elsewhere.
You know, I'm all for wringing free, additional performance out of hardware I already own, but at this late stage I'm totally not counting on it. Vega has been out for nearly 3/4 year now, I think it's a little late for the automatic primitive shader to make its appearance.

...But hey, I'd love to be proven wrong. AMD, get on with it! Make it happen! :D

Not placing any bets though.
 
You know, I'm all for wringing free, additional performance out of hardware I already own, but at this late stage I'm totally not counting on it. Vega has been out for nearly 3/4 year now, I think it's a little late for the automatic primitive shader to make its appearance.
Not disagreeing with you here, but primitive shaders would seem likely to persist through Navi and possibly future revisions. With AMD's hiring and time involved it wouldn't be unreasonable, but not all that useful in the lifetime of the card either. If you look at open source Linux drivers even R600 is still getting features added.
 
Under 100m2 would be pushing it, but it shouldn't be too far off is this diagram is reasonably accurate - assuming the raven ridge is actually 210mm2 as most sources state:

View attachment 2486

At say 110mm2, it would be very close to the Rx 560, but saying that, AMD saw fit to release the Rx 550 which according to most is 103mm2.
I’m pretty sure that the top left part of the “multimedia engine” are actually 8 RBE units.
 
Having some issues with AMDs latest drivers, and all card clocks getting stuck at maximum values when idling until reboot. Putting PC to sleep does nothing to correct the issue, unfortunately.

Might be tied to using custom profiles in Wattman, and/or running compute apps; I'm not sure. This isn't exactly easy to reproduce, as it happens quite occasionally. It does still happen with the most recent driver set, and the couple prior ones. Earlier drivers from december-january, maybe at least partway through february did not show this behavior - or it would not present itself for me at least. If it was random maybe I just got lucky, I dunno.

Also, and this is consistent behavior and not a bug from what I can tell (although I'd prefer if it was because hopefully it would get friggin fixed); if you set custom profile, the "SOC Clock" (as labeled in GPU-Z) won't clock down to idle clock when the card is idling, but rather stay at full, ~1100MHz-ish clock, at all times. This raises idle power consumption by several watts for no reason whatsoever. So... WHY!!!

Also, can someone explain why wattman only allows the final two power states to be configured? Seems really dumb IMO, as the card could potentially end up in some situations with a lower power state that consumes more power than the top states (like, when undervolting the GPU and running an older game.)

Edit:
Ok, for some reason, clocks are stuck at max permanently on both my vegas now. Not sure why, they didn't do this just a day or so ago. Now they're pulling about 45W each doing absolutely fuckall. Clearly not good.
 
Last edited:
Back
Top