AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Of course I was refererring to the turbo mode, hence "maximum load".

Another interesting question to ponder would be, when Nvidia (or AMD, for that matter) sees the HPC/AI market as sufficiently large enough for the **100 chips to no longer require graphics specific assets like rasterizers and texture mapping units. GP100 already has not seen much use outside of professional products, mainly Tesla and a bit of Quadro. At least the TMUs are still part of GV100 and since it is partitioned in GPCs, it seems like a safe bet that the rasterizer parts are still on board as well.


That is a good question, but well normally they adress too different market than pure accelerators, but maybe in future.

For the GP100, well the main reason, was really HBM2, not much praticable for Nvidia to use it outside Tesla at this time. Question of cost and availability.
 
At least the TMUs are still part of GV100 and since it is partitioned in GPCs, it seems like a safe bet that the rasterizer parts are still on board as well.
yes they are
This is a domain specific chip, said Jonah Alben, senior vice president of GPU engineering at Nvidia. “This chip can run games very well if we want it to, but the focus [of the V100] is to be a great chip for AI and for HPC, so we dedicated all the resources we could until it was illegal to do more.”
source: https://www.hpcwire.com/2017/05/10/nvidias-mammoth-volta-gpu-aims-high-ai-hpc/
 
This AMD slide deck from 2016 Q1 was posted earlier in the Ryzen thread but the last slide is also relevant to this thread.

RgwB9p6.jpg


What interests me are the "NextGen MCM" and "NextGen High-Perf MXM" boxes. I assume that the latter became Vega 20 or some Navi chip, for a few reasons:
  • The 2018 timeframe and the Capsaicin roadmap placing Navi in 2018 (although Navi seems to be in 2019 now).
  • "NextGen" maybe refers to a post-Polaris architecture?
  • The low end of Vega 20's planned TDP range intersects with the TDPs of desktop Tonga and Polaris 10 based cards. (The E8950 uses Tonga and Hontza presumably uses Polaris 10. While embedded GPU have different TDPs from desktop/server GPUs, I'm assuming that if two chips end up in cards with similar TDPs on the desktop/server then the same can be true in embedded.)
But what about the former?
  • It's too small for Vega 10.
  • It seems to be too small for Vega 11 given slides and rumors.
  • It's too large for Polaris 12, unless the RX 550 does not use a fully enabled Polaris 12 or if the chip changed since that slide. (Not to mention the "NextGen.")
Any thoughts?

EDIT: uploaded the last slide on imgur.
 
Last edited:
FWIW, golem.de claims, that the interposer is indeed two separte ones.
https://www.golem.de/news/nvidias-g...mit-des-technisch-moeglichen-1705-127773.html
The relevant passage for your google translate pleasure:
"Der Interposer, auf welchem der GV100 und die vier HDM2-Speicherstapel sitzen, sprengt die Dimensionen der Maske (Reticle), weshalb zwei benötigt werden."
Or, if you take my (deliberately literal) word for it:
"The Interposer on which the GV100 and the four HDM2-stacks sit, blows the dimensions of the reticle so that two are required."
From the grammatical structure, two would refer to reticles. It could be a a word omission though:
-> "…weshalb zwei Belichtungsdurchgänge benötigt werden."
-> "… so that two exposures are required".
To clarify, I believe Marc Sauter was in the same meeting I was in when I asked an NV engineer that question. The answer is definitely one interposer with two exposures, and not two separate interposers.

(Actually, it's probably somewhat arbitrary as the interposer segments could be fully isolated from each other. But regardless, what we were told is one single interposer with 2 exposures)
 
Any thoughts?
Raven Ridge. None of the Vegas, or any 2017 products for that matter, are on that chart. The high-perf likely being that 4096 core APU, or something similar, that has shown up in some documentation.

To clarify, I believe Marc Sauter was in the same meeting I was in when I asked an NV engineer that question. The answer is definitely one interposer with two exposures, and not two separate interposers.

(Actually, it's probably somewhat arbitrary as the interposer segments could be fully isolated from each other. But regardless, what we were told is one single interposer with 2 exposures)
By the same definition an entire wafer is a single interposer. Just need to reduce the spacing between dice to zero.
 
16GB 1600Mhz Vega id on compubench, 24% faster than the previously spotted id in one of the tests and almost similar in the only other test which has results for the new Vega sample.

https://videocardz.com/69475/amd-radeon-vega-spotted-with-16gb-memory-and-1600-mhz-clock

https://compubench.com/compare.jsp?...type1=dGPU&hwname1=AMD+6864:00&D2=AMD+687F:C1

When I browse that specific benchmark, though, that alleged Vega sample runs only ~10% faster than the highest Fiji entry (which does not list Dual-GPU or Overclocking, for full disclosure) with 13,421 vs. 12,240 MVox/sec. I hope that's not it.
 
What interests me are the "NextGen MCM" and "NextGen High-Perf MXM" boxes. I assume that the latter became Vega 20 or some Navi chip, for a few reasons:
  • The 2018 timeframe and the Capsaicin roadmap placing Navi in 2018 (although Navi seems to be in 2019 now).
  • "NextGen" maybe refers to a post-Polaris architecture?
  • The low end of Vega 20's planned TDP range intersects with the TDPs of desktop Tonga and Polaris 10 based cards. (The E8950 uses Tonga and Hontza presumably uses Polaris 10. While embedded GPU have different TDPs from desktop/server GPUs, I'm assuming that if two chips end up in cards with similar TDPs on the desktop/server then the same can be true in embedded.)
But what about the former?
  • It's too small for Vega 10.
  • It seems to be too small for Vega 11 given slides and rumors.
  • It's too large for Polaris 12, unless the RX 550 does not use a fully enabled Polaris 12 or if the chip changed since that slide. (Not to mention the "NextGen.")
Any thoughts?

"Next-gen MCM" a direct successor to an embedded Cape Verde (10 CU), so it's just a fully enabled Polaris 12, given @Ryan Smith 's tip. Hontza has been a product available for a while, it's the E9550 but AMD ended up enabling all the 36CUs available.
"Next-gen MXM" is either a small Vega (if such a thing is ever coming) or a Navi. My bet is on a Navi, given the timeframe. Polaris was just mid-to-low end and Vega may be only high-to-top end, whereas Navi should be a full lineup stack (otherwise all the talk about being easily scalable won't make any sense).




16GB 1600Mhz Vega id on compubench, 24% faster than the previously spotted id in one of the tests and almost similar in the only other test which has results for the new Vega sample.

https://videocardz.com/69475/amd-radeon-vega-spotted-with-16gb-memory-and-1600-mhz-clock

https://compubench.com/compare.jsp?benchmark=compu20d&did1=49807462&os1=Windows&api1=cl&hwtype1=dGPU&hwname1=AMD+6864:00&D2=AMD+687F:C1

I checked the compubench info on dual-GPU cards because I thought those 16GB could be from a dual-Vega card, but the dual-Fiji Radeon Pro Duo only shows 4GB in that parameter.
Maybe SK Hynix has been hiding 8-Hi stacks from us all this time.



When I browse that specific benchmark, though, that alleged Vega sample runs only ~10% faster than the highest Fiji entry (which does not list Dual-GPU or Overclocking, for full disclosure) with 13,421 vs. 12,240 MVox/sec. I hope that's not it.

Results for the regular Fury vary between ~5000 and 11500 Mvox/s. This seems to be very driver-dependent.
I wouldn't worry too much with benchmark results in this test, for now.
 
Results for the regular Fury vary between ~5000 and 11500 Mvox/s. This seems to be very driver-dependent.
I wouldn't worry too much with benchmark results in this test, for now.
... hence: "I hope that's not it" which i meant quite honestly. :)

BTW - you linked the wrong benchmark. The one with 24% more performance for the newer alleged Vega compared to the older alleged Vega is with Level Set Segmentation 256:
https://compubench.com/subtest_resu...ndows&api=cl&D=AMD+Radeon+(TM)+R9+Fury+Series

There, results range only from 8300 to 12400 (ish) with a median of 10280, which of course includes all the effed-up results as well. And Fury also includes the lower clocked, partially disabled non-X-variant, maybe even the Nano which does not have entries of it's own.
 
Last edited:
It's unlikely that the different vega cards have different amounts of memory stacks and bus width.

16GB is also too much for a consumer card unless AMD are aping nvidia's Titan, the Maxwell and Kepler, or want it to be a point over nvidia's 12GB cards.

There are some rumors that it'll clock over 1700 for AIBs and even overclock beyond 1.8Ghz, that will be good enough to match the 1080Ti.
 
Having a "lot" of vram doesn't exclude to manage it better anyway. I still love my Fury X in my custom loop, 4gb is short in some game (not a lot actually, 1440p here), but it's still a powerhouse. I will skip Vega, but I'm very interested in the architecture.
 
Last edited:
16GB 1600Mhz Vega id on compubench, 24% faster than the previously spotted id in one of the tests and almost similar in the only other test which has results for the new Vega sample.

https://videocardz.com/69475/amd-radeon-vega-spotted-with-16gb-memory-and-1600-mhz-clock

https://compubench.com/compare.jsp?benchmark=compu20d&did1=49807462&os1=Windows&api1=cl&hwtype1=dGPU&hwname1=AMD+6864:00&D2=AMD+687F:C1
But which chip is it? CU count matches Vega 10, but AFAIK neither Samsung nor Hynix does 8-Hi HBM2 stacks yet which means that either it's not Vega 10 or Vega 10 desktop has 2x1024bit memory controllers disabled
 
16GB is also too much for a consumer card unless AMD are aping nvidia's Titan, the Maxwell and Kepler, or want it to be a point over nvidia's 12GB cards.
16GBs kind of contradicts the needs for HBC doesn't it?

After thinking about it, we don't know how Vega's new High Bandwidth Cache Controller behaves in the eyes of the OS. It could be a Vega that simply shows direct "control" of 16GB of system RAM with the HBM2 being used as cache, and this could be completely transparent to anything but the driver.


But which chip is it? CU count matches Vega 10, but AFAIK neither Samsung nor Hynix does 8-Hi HBM2 stacks yet which means that either it's not Vega 10 or Vega 10 desktop has 2x1024bit memory controllers disabled

I singled-out the important part, which is no company is making 8-Hi stacks according to publicly released info.
There seems to be quite a lot of stuff being produced in secret nowadays. GDDR5X came out of nowhere with specs and production being announced just over a month before GP104 cards were in the hands of reviewers. TSMC had 12FFC volume production scheduled for 2018 but turns out nvidia has either been stocking chips from risk production or co-developed a secret second 12FF with TSMC.
 
But which chip is it? CU count matches Vega 10, but AFAIK neither Samsung nor Hynix does 8-Hi HBM2 stacks yet which means that either it's not Vega 10 or Vega 10 desktop has 2x1024bit memory controllers disabled

It's one of Vega10 pci ids.

{0x1002, 0x6860, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6861, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6862, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6863, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6864, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6867, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x6868, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x686c, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10},

{0x1002, 0x687f, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGA10}

Also,a german hardware forum user who's also on reddit had asserted that the Vega id was 6860 before AMD released these in the linux patches, so I think the 6860 will be the final consumer variant or the flagship card.

https://www.reddit.com/r/Amd/commen...vega_tech_day_is_happening_right_now/daxzagd/

Regarding the bus width, maybe a compubench tests for that or gives an indication of bandwidth. Disabling two memory channels out of four doesn't sound like AMD who rarely cut them down.
 
When I browse that specific benchmark, though, that alleged Vega sample runs only ~10% faster than the highest Fiji entry (which does not list Dual-GPU or Overclocking, for full disclosure) with 13,421 vs. 12,240 MVox/sec. I hope that's not it.
TBH the 16GB is an alarm bell for me.

Some may be interested on a few details from the newest SK Hynix Q2 2017 catalogue (all previous news was on Q1 book).
HBM2 still only 4GB 4-Hi stack at only 1.6Gbps.
GDDR5 10Gbps Q4'17
GDDR6 12&14Gbps Q4'17

Cheers
 
16GBs kind of contradicts the needs for HBC doesn't it?
Hardly, capacity aside it's the best method when aggregating different tasks. VMs for example or simply the desktop and 3D app simultaneously. Plus the possibility of evicting pages if space is required for an intermediate task.
 
Back
Top