AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

HBM does have tangible benefits though. Form factor at the very least and latency at smaller page sizes.
Also missing are the assembly and smaller component costs like voltage regulators, display connectors, cooling systems and other stuff.
Why normalize the GPU just to include in the list?
I think the cost "at release time" makes more sense. I should try to read that article when I get the time, maybe they say how they got those costs.


It has benefits. I was reacting a bit to ToTTenTranz's post which had a rather dismissive style.
The missing context here are the posts like this that have been repeated over and over for the past two years with zero proof on such claims:

The FuryX will be considerably more expensive to make than the GTX 980 Ti, so they can't undercut it in price.
There is no need to list the full BOM. What matters is the difference due to using HBM.
(...)
HBM is the elephant in the room.

There are worse, from the same user and others. I think I remember seeing claims of a >$100 difference between adopting HBM and GDDR5, but I won't bother looking for those.
Turns out the actual difference in the memory itself between Hawaii's 4GB and Fiji's 4GB (a.k.a the elephant in the room) was $16, plus an interposer + higher packaging cost that is partially deducted from a lower PCB cost. $41 difference in total, for a first-gen memory technology. Not bad at all.
There's a real chance the 390X's memory subsystem with 16*4Gbit 6000MT/s chips (instead of 290X's 16*2Gbit 5000MT/s) was actually very close to Fury's in BoM.


Fiji is 20 months newer, so price per waffer was lower and yields higher. As for the $41 difference - the table doesn't take into account, that HBM interface is significantly smaller then GDDR5 interface, so hypotetical Fiji with 512bit GDDR5 would be larger and thus more expensive. It's also not possible to compare these prices directly, because 512bit GDDR5 interface of R9 290X offered 320 GB/s, while 4096bit HBM interface offered 512 GB/s. Price of HBM solution was higher, but bandwidth also (+60 %).
That comparison is not really fair to GDDR5 at the time.

There would ever be a 100% fair comparison. Using 290X vs. Fury/X costs (probably at release time if we are to trust those GPU costs) may be as good as it gets.

AMD went with a lowly clocked 512-bit design for Hawaii, arguably to lower the power consumption, but the 780 Ti released within the same month, used a 384-bit bus and actually had slightly higher memory bandwidth (336 GB/s vs 320 GB/s) at probably significantly reduced costs.
Without knowing the cost difference between 7000MT/s and 5000MT/s memory back in 2013, as well as difference in PCB cost between using a 512bit or a 384bit bus, there's really no know for sure which memory subsystem was more expensive.


41$ extra is too much for mainstream cards. Add margins + profits (manufacturer + stores) and we could be talking about nearly 200$ -> 300$ retail price increase.
Margins are calculated for the whole product and not per each component in the PCB. Why would stores charge more for using one type of memory vs. another?
$41 in higher BoM would never force a $200-$300 retail price increase (730% margins? wow...).
The adoption of such memory could result in substantially more performance and the IHV+OEMs could just demand substantially higher margins for it, but that's a whole other story.
 
The missing context here are the posts like this that have been repeated over and over for the past two years with zero proof on such claims:

The lack of HBM cards in the market other than the high end Fury is all the proof we need. Silent guy has been right on the money on that. The costs are clearly the major reason and the fact that with faster GDDR speeds the benefits are close to zero with the current compute capabilities. Even a few $10s will amount to quite a bit when you scale the unit counts up and for what benefit?
 
Margins are calculated for the whole product and not per each component in the PCB. Why would stores charge more for using one type of memory vs. another?
$41 in higher BoM would never force a $200-$300 retail price increase (730% margins? wow...).
The adoption of such memory could result in substantially more performance and the IHV+OEMs could just demand substantially higher margins for it, but that's a whole other story.

$41 bucks is a lot a card that is priced at 400 to 600 bucks talking about 7 to 11% margins drop.

Just think about AMD's margins, 32% (34% last quarter), but adding 7 to 11% to that? Huge amount of increase. Granted, the Fury X wasn't what caused their over all margins, but when we start looking at margins, that % has a big influence on a company bottom line.
 
$41 in higher BoM would never force a $200-$300 retail price increase (730% margins? wow...).
I said that 41$ extra BOM could result in close to 100$ retail price increase. Thus you can't realistically expect to see HBM2 products in 200$-300$ range.

High end is of course another matter. Nvidia is selling cards at 700$+ currently. If Vega is competitive with Nvidia high end, then HBM2 is not a problem in that market.
 
The lack of HBM cards in the market other than the high end Fury is all the proof we need.
No, it's not.
HBM1 didn't bring only a cost premium, it also had the limitation of 1GB VRAM per stack, and all mid-range offerings have been sold with 6GB or more since 2016.

We'll talk about this "proof" of yours if we don't end up seeing lower-than-high-end cards using HBM2.
Silent Guy's theory was always that HBM was super expensive, i.e. the elephant in the room without even considering the costs for the interposer:

(...)
- Interposer vs none at all
- HBM memory vs GDDR5

- Watercooling vs air cooling

If you think that those last 3 will be less than $10 in extra cost, then there's really not much else to discuss. The Interposer alone is going to be more than that. But HBM is the elephant in the room.

Turns out the actual difference between 4GB HBM1 and 4GB GDDR5 was $16.
Oops.

I said that 41$ extra BOM could result in close to 100$ retail price increase. Thus you can't realistically expect to see HBM2 products in 200$-300$ range.
The non-X Fury cards were in that price range for quite some time, actually. Also, I'd argue that 4*HBM1 stacks in Fiji may have been more expensive than the 2*HBM2 stacks will be in Vega 10.

I'm not suggesting Vega 10 will cost $300. Unless it tanks in performance compared to even a GTX 1080 we should expect at least $500 for a fully-enabled chip.
However, if e.g. Vega 11 comes up as half a Vega 10 (32 NCUs, single HBM2 stack) I don't see why it couldn't come in the $200-300 range. Smaller GPU, smaller interposer, smaller substrate, only one stack -> doesn't look like HBM2 is such a stopping power for an upper-mid range.
 
8Hi stacks are probably the practical limit for now. There is actually no hard limit. The DRAM manufacturers are free to fit as many dies in the specified stack height (720 +- 25 µm) as they can (the HBM spec specifically mentions the possibility to distribute a 128bit channel over multiple dies, the restriction is that the access latencies need to be constant within a channel [different channels within a stack can operate with different latency settings and even different clocks]). Given that stacked dies in flash memory can be as thin as 30µm, one may expect 16Hi stacks in the slightly farther future.
hbm_channel_distri9cun0.png

I thought the challenge for 8-Hi stacks was primarily thermals (Logic die and lowest memory die)/junction temperature/heat dissipation-cooling, especially so for GPU implementations?
Thanks
 
Last edited:
The non-X Fury cards were in that price range for quite some time, actually. Also, I'd argue that 4*HBM1 stacks in Fiji may have been more expensive than the 2*HBM2 stacks will be in Vega 10.
The regular Fury was priced like a premium product at launch ($549). It wasn't competitive at that price point. People were also afraid to spend that much on a 4 GB card. I wouldn't expect AMD to make much profit of these discounted Fury cards. Vega of course might be a completely different matter. Let's see how competitive it is against GTX 1070/1080. If Vega sells like hotcakes, economics of scale would definitely make HBM2 + interposers cheaper in the long run.
 
New AMD Radeon RX Vega Details Surface In Linux Patch - 4096 Shader procs
From the looks of it all, AMD Radeon RX Vega has 4 Shader Engines, 64 NCUs, 4 Render Back-Ends & 256 Texture Units. The funky g33ks from ComputerBase spotted this one after diving into the driver. If you check the code, this is the result set for Vega10 as spotted in the Linux drivers:

  • case CHIP_VEGA10:
  • adev->gfx.config.max_shader_engines = 4;
  • adev->gfx.config.max_tile_pipes = 8;
  • adev->gfx.config.max_cu_per_sh = 16;
  • adev->gfx.config.max_sh_per_se = 1;
  • adev->gfx.config.max_backends_per_se = 4;
  • adev->gfx.config.max_texture_channel_caches = 16;
  • adev->gfx.config.max_gprs = 256;
  • adev->gfx.config.max_gs_threads = 32;
  • adev->gfx.config.max_hw_contexts = 8;
These specs and details are on par with what we have been telling for a long time now. 64 CUs x 64 shader units = 4096 shader processors. These are divided over four blocks. There is a total of 64 ROP units (16 per block) and the GPu is to get 256 Texture memory units. If you compare it, that pretty much twice a Radeon RX 570. The architecture details actually also show a lot of similarities with Fiji (Radeon R9 Fury (X)). Vega 10 is expected to battle the GeForce GTX 1070 and 1080.
http://www.guru3d.com/news-story/am...surface-in-linux-patch-4096-shader-procs.html
 
Is there any explanation of why AMD are limiting their high-end GPUs to 64 ROPs?
 
Without knowing the cost difference between 7000MT/s and 5000MT/s memory back in 2013, as well as difference in PCB cost between using a 512bit or a 384bit bus, there's really no know for sure which memory subsystem was more expensive.

There's no way to confirm or be sure, but it's pretty obvious which approach is cheaper time and time again. Why do you think Nvidia always goes for narrow but fast memory subsystems? Do you honestly believe they consistently choose the option that limits memory capacity, increases power most of the times and forces them to use higher tier/lower volume memory, without being a clear cost advantage?

AMD almost always makes the same choice, Hawaii being the only clear exception and they did it, iirc even according to their own marketing material, to reduce power consumption on a card that was clearly on the limits of acceptable. As in, without that choice it would probably get out of PCIe spec.
 
There's no way to confirm or be sure, but it's pretty obvious which approach is cheaper time and time again. Why do you think Nvidia always goes for narrow but fast memory subsystems? Do you honestly believe they consistently choose the option that limits memory capacity, increases power most of the times and forces them to use higher tier/lower volume memory, without being a clear cost advantage?
nvidia has been going for narrower but faster memory since Maxwell (not always) because they've been able to clock their chips considerably higher, and that includes the on-chip memory controller.
Again, this applies to Maxwell and Pascal. For Kepler (780 Ti included) and earlier architectures this really doesn't apply. Tahiti cards eventually started performing closer to GK110 cards, proving the 384bit bus was actually as adequate as it was for the Geforce 780, and before that Fermi cards actually used wider buses than their Terascale 2/3 counterparts.



Keep in mind there's 1+ year of difference between the two costs (as evidenced by die costs). GDDR5 never gets cheaper?
On one hand, lower-frequency GDDR5 chips do get cheaper.
On the other hand, you'd need 8000MT/s GDDR5 chips in a 16*32bit configuration to reach the same theoretical bandwidth as 4 stacks HBM1, and such GDDR5 chips didn't even exist in 2015 when the Fiji cards launched.

You're not comparing apples-to-apples either way. Might as well just stick with prices at launch date.
 
nvidia has been going for narrower but faster memory since Maxwell (not always) because they've been able to clock their chips considerably higher, and that includes the on-chip memory controller.
Again, this applies to Maxwell and Pascal. For Kepler (780 Ti included) and earlier architectures this really doesn't apply. Tahiti cards eventually started performing closer to GK110 cards, proving the 384bit bus was actually as adequate as it was for the Geforce 780, and before that Fermi cards actually used wider buses than their Terascale 2/3 counterparts.

Kepler had the exact same memory configurations as Maxwell and Pascal, so I don't know what you're talking about, and I'm beginning to think you neither. As for the rest I fail to see how any of that is relevant and in any way related to the memory as opposed to driver development and the effect that GCN based consoles had on general game engine optimizations for GCN.

On one hand, lower-frequency GDDR5 chips do get cheaper.
On the other hand, you'd need 8000MT/s GDDR5 chips in a 16*32bit configuration to reach the same theoretical bandwidth as 4 stacks HBM1, and such GDDR5 chips didn't even exist in 2015 when the Fiji cards launched.

Assuming that you'd need the same bandwidth as 4 stacks of HBM, which you don't.

You're not comparing apples-to-apples either way. Might as well just stick with prices at launch date.

Sticking to launch prices from launches that are so far appart is stupid. You need to compare to what it is available at the time. Either way I don't know what you're pursuing in this whole convo with everyone. It's pretty obvious you think that that comparison somehow vindicates your past claims, when in fact it's pretty much doing the opposite, as so many others are pointing out. Personally, I'm out.
 
It's worth noting back in the mists of time that the IHVs used to bundle the memory with the GPU. They stopped doing that, as it meant that they were freed from dealing with the problem of getting the right memory to go along with the GPUs they were selling. The AIBs were then responsible for that problem. It also meant that the memory on the card was no longer "re-sold" to the AIBs, further complicating things.

HBM gives the IHVs this problem, again.

Though it seems in recent years that NVidia has been building cards explicitly for sale, so it has got back into the memory game (though not reselling it). But I think it's pretty clear that's only a very small proportion of all the NVidia cards that are on the market.
 
Is there any explanation of why AMD are limiting their high-end GPUs to 64 ROPs?
They don't need more it seems.

Vega with ~50% higher clock than Fury X + draw stream binning rasteriser should be fine with 64 ROPs. This plus other stuff should mean the ROPs do less work per screen pixel than Fury X.
 
It's worth noting back in the mists of time that the IHVs used to bundle the memory with the GPU. They stopped doing that, as it meant that they were freed from dealing with the problem of getting the right memory to go along with the GPUs they were selling. The AIBs were then responsible for that problem. It also meant that the memory on the card was no longer "re-sold" to the AIBs, further complicating things.

HBM gives the IHVs this problem, again.

Apart from the re-selling factor, how different is this from nvidia using GDDR5X which only Samsung produces?
There's only one provider of HBM2 for AMD cards so far and it's Hynix. There's not a lot of possible confusion from that area..



They don't need more it seems.

Vega with ~50% higher clock than Fury X + draw stream binning rasteriser should be fine with 64 ROPs. This plus other stuff should mean the ROPs do less work per screen pixel than Fury X.

Yup.
Synthetic benchmarks for pixel fillrate on the RX 480 show that a hypothetical Big Polaris with 64 ROPs at 1200MHz would already be close to a GTX 1070 in pixel and texel fillrate:

XvtyFbs.png
DSosiNY.png


Multiply that for 1.25 (to go from 1200 to 1500MHz) and you'd probably get closer to GTX 1080 scores. This would be without considering Vega's improved rasterizer.


Kepler had the exact same memory configurations as Maxwell and Pascal, so I don't know what you're talking about, and I'm beginning to think you neither.
Your point was higher memory clocks at lower width compared with their AMD competition. Stop moving goalposts.
Why do you think Nvidia always goes for narrow but fast memory subsystems?

With Fermi vs. Terascale, nvidia used memory at lower clocks but larger width (GF104 actually had a 320bit bus). With Kepler vs. GCN1 in 2012-2013, nvidia used memory at the same clocks as their AMD competition.
Your narrow but fast analogy only applies to late Kepler (2014 refreshes) and Maxwell cards.

Assuming that you'd need the same bandwidth as 4 stacks of HBM, which you don't.
Not what was being discussed.

Sticking to launch prices from launches that are so far appart is stupid.
"So far apart is stupid" = 18 months, for cards with the same amount of VRAM from the same vendor using the same manufacturing process.

<Mod Redacted>
 
Last edited by a moderator:
Your point was higher memory clocks at lower width compared with their AMD competition. Stop moving goalposts

That was not my point. <Mod Redacted>. I only mentioned the competing Nvidia cards because they are the proof that the required memory performance was available with narrower bus widths.

With Fermi vs. Terascale, nvidia used memory at lower clocks but larger width (GF104 actually had a 320bit bus). With Kepler vs. GCN1 in 2012-2013, nvidia used memory at the same clocks as their AMD competition.
Your narrow but fast analogy only applies to late Kepler (2014 refreshes) and Maxwell cards

First of all, GF104 actually used a 256-bit bus. It was the cut down GF100/110 that used a 320 bit. <Mod Redacted>


Not what was being discussed.

Yes, of course it is what was being discussed. It's the viability of both types of memory that is being discussed. The amount of bandwidth that is actually required is not only relevant, it's essential.

"So far apart is stupid" = 18 months, for cards with the same amount of VRAM from the same vendor using the same manufacturing process.

Acting like 18 months is not a lot of time in the tech world? Oh my...
 
Last edited by a moderator:
Yup.
Synthetic benchmarks for pixel fillrate on the RX 480 show that a hypothetical Big Polaris with 64 ROPs at 1200MHz would already be close to a GTX 1070 in pixel and texel fillrate:

XvtyFbs.png
DSosiNY.png


Multiply that for 1.25 (to go from 1200 to 1500MHz) and you'd probably get closer to GTX 1080 scores. This would be without considering Vega's improved rasterizer.
Of course it would be... It's kinda very obvious that 64 ROP chip at 1.5GHz would be quite close to 64 ROP chip at 1.6GHz isn't it? However "improved rasterizer" won't show up it's nose in this kind of synthetic benchmarks and magically boost it (past it's theoretical max) closer to GeForce.
Now what would happen with fillrate in practice when you throw couple of thousand small triangles at it is an entirely different story. Here the binning rasterizer will come into play. How well it will fare against Maxwell and Pascal remains to be seen.
 
Back
Top