When will Third Party Vega Boards be available?

When do you think Third Party Vega boards will be available?

  • Before New Years 2017

    Votes: 0 0.0%
  • Before end of Second Quarter 2018

    Votes: 0 0.0%

  • Total voters
    25
I wonder at what point what exactly went wrong with HBM, given it's promise of great power saving potential. It cannot be so simple as top-level engineers not seeing that heat sources move a lot closer together compared to GPU+GDDR distributed over a much larger area.
Possibly engineering constraints, most tech products sold have them as the product has to be mass produced (in engineering easier to do 1 or 2 products identical but harder when mass producing and the solution has to take that into consideration) and critically to a price point.
Worth noting AMD runs higher HBM2 clocks than Nvidia who seem to always have been conservative with it - I am thinking Vega 64.
It would be interesting to see a strip down comparison between AMD and Nvidia packages.

Anyway still better than GDDR5X IMO and the issues that has had on Nvidia now and again.
 
Last edited:
As HBM doesn't dissipate that much heat the benefit of exactly matching height might be very minimal anyway. Apparently it's on the order of some hundredths of a millimeter difference.
The concern is the HBM is taller, not cooling it specifically. It could prevent proper contact with the core. If the stacks we're shorter there would be less concern.

Power saving comes mostly from controllers.
Controllers and the actual signaling. DRAM cells don't change all that much, just driving the signals. Even the chips will use a bit less power as there is less capacitance in the lanes between them.

Worth noting AMD runs higher HBM2 clocks than Nvidia who seem to always have been conservative with it - I am thinking Vega 64.
Comparable parts were also released nearly a year apart. The process would have improved and P100 simply not updated with higher speed memory. I doubt there is much to see there.

As for actual thermal modeling, there is only so much that can change with the cooler. They are generally a contact plate with heat pipes dissipating as much heat as possible. Engineering one side for lower temperatures doesn't really happen. In the case of a water loop maybe the inlet is over the hottest spot, but only so much can be done with phase change. If modeling I'd expect TECs or a more active setup. Most cooler designs just aren't that complex to adapt and the chip should have been designed to avoid significant hotspots prior to an AIB receiving one. Even the reference coolers aren't anything special.
 
The concern is the HBM is taller, not cooling it specifically. It could prevent proper contact with the core. If the stacks we're shorter there would be less concern.


Controllers and the actual signaling. DRAM cells don't change all that much, just driving the signals. Even the chips will use a bit less power as there is less capacitance in the lanes between them.


Comparable parts were also released nearly a year apart. The process would have improved and P100 simply not updated with higher speed memory. I doubt there is much to see there.

As for actual thermal modeling, there is only so much that can change with the cooler. They are generally a contact plate with heat pipes dissipating as much heat as possible. Engineering one side for lower temperatures doesn't really happen. In the case of a water loop maybe the inlet is over the hottest spot, but only so much can be done with phase change. If modeling I'd expect TECs or a more active setup. Most cooler designs just aren't that complex to adapt and the chip should have been designed to avoid significant hotspots prior to an AIB receiving one. Even the reference coolers aren't anything special.
I am thinking about the V100 with regards to comparing with Vega64.
Not great difference but still there.
And Nvidia were very conservative before that with the P100 like you say.

I think we have a different perspective on engineering a thermal dissipation solution beyond what one slaps on a CPU/GPU die generally, but anyway a crucial aspect that I keep mentioning is the performance envelope and characteristics are usually also changed by custom AIBs.
Look I am not saying it is impossible just a headache without the right advanced tools and access to the right engineers at SK Hynix/Samsung and working out the models/simulations while also adapting to the different packages (need to know how this is influenced by the AIB solution).
In the same way SPICE makes life easier for engineers designing ICs.
Who knows, maybe the AIB partners are waiting for a large amount of one packaging to simplify their manufacturing *shrug*.
 
Last edited:
I am thinking about the V100 with regards to comparing with Vega64.
Not great difference but still there.
And Nvidia were very conservative before that with the P100 like you say.
Even then, the question remains if bandwidth or capacity was the driving factor with four stacks. Does V100 need more bandwidth? Ignoring slight power and reliability differences. Need to check if they actually used 8-Hi on anything, or if it's an option.

Who knows, maybe the AIB partners are waiting for a large amount of one packaging to simplify their manufacturing *shrug*.
That's assuming they aren't different product numbers and each AIB has a different, but consistent part. The wait seems more likely a result of fab allocations in the face of Ryzen demand. I wouldn't be surprised if we see a TSMC or other fab on a refreshed part just for added capacity. That WSA amendment has to be for something. On the flip side demand seems to finally be easing a bit with recent volume.
 
Even then, the question remains if bandwidth or capacity was the driving factor with four stacks. Does V100 need more bandwidth? Ignoring slight power and reliability differences. Need to check if they actually used 8-Hi on anything, or if it's an option.


That's assuming they aren't different product numbers and each AIB has a different, but consistent part. The wait seems more likely a result of fab allocations in the face of Ryzen demand. I wouldn't be surprised if we see a TSMC or other fab on a refreshed part just for added capacity. That WSA amendment has to be for something. On the flip side demand seems to finally be easing a bit with recent volume.
Vega64/Frontier Edition needs bandwidth, V100/P100 needs bandwidth so not sure why differentiate them.

What makes you think it is more to do with Ryzen desktop/server rather than at least 3 different packaging logistics,yields and variations related to GPU-HBM?
Who is to say the batch of GPU dies or HBM is consistent between the different packaging logistics/product.
You think the AIB partners that are in an extremely price competitive segment are going to spend time and resources developing multiple manufacturing line/logistics processes for just one custom model if there is an alternative approach?
They would need to do various dissipation tests/QA tests/performance envelope tests/production logistics/etc as there is no guarantee that each package has same consistency to the other chains.
But this ignores my point previously about complexity of manufacturing just 1 or 2 products identical and then 1000s; most tech product engineers will tell you it is easier to do 1 or 2 identical but not when mass producing and the design needs to take this into account.

Not sure what you mean by assumption and may be a misunderstanding by me or you, it is known there are 3 different packaging plants and possibly 4 used for Vega and these are being sourced to the partners, and possibly a variation in the HBM and GPU yields/performance envelope spec.
I guess one could check how the logistics was approached with Fiji, which I thought was single sourced.

Edit:
Yeah Fiji was done by only Amkor in South Korea.
 
Last edited:
I wonder at what point what exactly went wrong with HBM, given it's promise of great power saving potential. It cannot be so simple as top-level engineers not seeing that heat sources move a lot closer together compared to GPU+GDDR distributed over a much larger area.
HBM has its pros and cons. Besides the need for a way to densely pack signal lines - be it interposers or EMIB - the other great technical challenge was always going to be die stacking. Which itself is not a new idea, but HBM operates at much higher power levels than a PoP SoC (particularly for its 2D surface area) and requires TSV mastery as well.
 
What makes you think it is more to do with Ryzen desktop/server rather than at least 3 different packaging logistics,yields and variations related to GPU-HBM?
Because Polaris has similar issues and obviously HBM isn't a factor there. Mining demand has been significant enough that producing a few more chips would entail little risk. Another factor is the modified WSA suggesting AMD foresaw a need for more capacity. Considering the recent market share numbers and AMDs popularity with mining, they should have outpaced Nvidia with Polaris sales. That doesn't appear to have happened, yet financials show increased revenue and margins.

You think the AIB partners that are in an extremely price competitive segment are going to spend time and resources developing multiple manufacturing line/logistics processes for just one custom model if there is an alternative approach?
I'm saying they wouldn't have to, but some likely would do just that with various bins of the chip. A different cooler design for a high end SKU isn't unheard of.

Not sure what you mean by assumption and may be a misunderstanding by me or you, it is known there are 3 different packaging plants and possibly 4 used for Vega and these are being sourced to the partners, and possibly a variation in the HBM and GPU yields/performance envelope spec.
I guess one could check how the logistics was approached with Fiji, which I thought was single sourced.
The assumption is that each packaging plant isn't a different variation of the product, each going to a single partner. Four plants and four variations can still mean each partner has a single variant. Each designs for one product variation. Obviously a bit more complex spreading the allocations, but even for a single partner each variation could be a different product. No default Vega design with multiple component configurations.
 
Because Polaris has similar issues and obviously HBM isn't a factor there. Mining demand has been significant enough that producing a few more chips would entail little risk. Another factor is the modified WSA suggesting AMD foresaw a need for more capacity. Considering the recent market share numbers and AMDs popularity with mining, they should have outpaced Nvidia with Polaris sales. That doesn't appear to have happened, yet financials show increased revenue and margins.


I'm saying they wouldn't have to, but some likely would do just that with various bins of the chip. A different cooler design for a high end SKU isn't unheard of.


The assumption is that each packaging plant isn't a different variation of the product, each going to a single partner. Four plants and four variations can still mean each partner has a single variant. Each designs for one product variation. Obviously a bit more complex spreading the allocations, but even for a single partner each variation could be a different product. No default Vega design with multiple component configurations.
I see no issue getting Polaris 580 custom models, just checked multiple large scale retailers *shrug*.
Well you have a different view on tech manufacturing logistics to me, especially when involving separate multiple plants but I can only go by my own experience in work.

My point is the variation comes down to yields and performance envelope along with subtle physical variation in the extreme between possibly two of the packaging plants, unlike before with Fiji where all factors were more easily controlled and evaluated/binned.
Same goes with the cooler in two ways; one being worst case scenario of partner receiving packaging that has physical/material differences, and second due to the batch/logisistics/chains involved one cannot guarantee there is no variation between what packaging plant a delivers compared to plant b annd this is important when again I need to reiterate custom AIBs are different to reference and reference spec along with different traits meaning it will have impact on both thermal considerations and also tolerance differences between what they receive.
I feel it would be more of an assumption to think everything is completely consistent when the manufacturing and logistics is so much more diverse this time unlike Fiji that involved just AMD-SK Hynix-Amkor in South Korea.
 
Last edited:
The assumption is that each packaging plant isn't a different variation of the product, each going to a single partner.
That's an odd way of going about things though. Each plant would have to precisely match demand from each AMD partner, and there would be little margin for economics of scale with so many separate packaging plants involved.
 
That's an odd way of going about things though. Each plant would have to precisely match demand from each AMD partner, and there would be little margin for economics of scale with so many separate packaging plants involved.
Odd, but not necessarily unreasonable and my example was rather basic. There could be price differences as well as more than a handful of partners. I'd definitely agree a consistent product is easier to manage from an inventory perspective, but that could be an AMD issue, not a partner one. Has anyone checked if any partner is shipping different configurations on the same product? I've seen the different variants, but they were all AMD reference designs as I recall. Only issue should be aftermarket coolers.
 
Back
Top