AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Hey, I would take David Kanters best guess over anyone else's facts! The man knows what he's talking about when he talks, I would count his estimates as extremely accurate.
I could agree if he was talking about 2x 8-Hi, but $150 for 2x 4-Hi I don't care if he hand-crafted the chips I still wouldn't believe him without actuall bill of materials + work
 
If you don't know what height your chip will be, how the watercooling compagnies like EK or XSPC will provide a waterblock... (and EK already does) ?
 
Is the 0.1mm difference in height a problem for thermal paste to cease working properly?
 
I could agree if he was talking about 2x 8-Hi, but $150 for 2x 4-Hi I don't care if he hand-crafted the chips I still wouldn't believe him without actuall bill of materials + work
I think we can speculate the cost for 2x 4HI would be in the neighborhood of $150+ based on this HBM1 breakdown.
sys-plus-3.jpg

https://forum.beyond3d.com/posts/1982501/
 
Last edited by a moderator:
I think we can speculate the cost for 2x 4HI would be in the neighborhood of $150+ based on this HBM1 breakdown.

https://forum.beyond3d.com/posts/1982501/
Same Electroiq site released the analyst estimation which says that 4 stacks of HBM1 on Fiji costs $48 + $25 for interposer & $30 for substrate+packaging. That $150 could be correct if it includes those, but definitely not for just the 2 4-Hi HBM2 stacks
 
Same Electroiq site released the analyst estimation which says that 4 stacks of HBM1 on Fiji costs $48 + $25 for interposer & $30 for substrate+packaging. That $150 could be correct if it includes those, but definitely not for just the 2 4-Hi HBM2 stacks

And if 4Hi HBM2 stacks cost 2.5x more than 4Hi HBM1 stacks, then the price for Vega would be (48/2)*2.5+25+30 = 60 + 25 + 30 = $115
And this is assuming the interposer costs the same between Fiji and Vega 10, though being smaller it should be cheaper too. Packaging for half the chips should be significantly cheaper too.

Regardless, $115 is a far cry from the $175 cost that Gamers Nexus released, and that later in the article went magically up to $200.

Now if that value refers to the 8Hi stacks in Vega FE, then it makes a lot more sense.
 
Last edited by a moderator:
In the video they clearly talk about total cost of implementation not just the stacks.
In the article they say it's 150 for the stacks plus 25 for the interposer:

Regardless, we’re at about $150 on HBM2 and $25 on the interposer, putting us around $175 cost for the memory system.
 
Maybe because that's what customers of Google's cloud AI service want for the time being: NVidia's platform within that service? No reason to suppose that is a long term prospect.

Why would Google use NVidia for its own internal processes, now that it has TPU V2? In other words why build Tensor Flow and TPU? Do you think it's one of those beta things that Google will abandon after a couple of years?

For one thing, Google's TPU has limited FP32 capabilities and no FP64 support that I know of, which makes it great for inference but of limited use for training, assuming it's even usable at all for such purposes.
 
Google specifically mentioned DNN training as a design goal for TPU v2. So you could conclude, TPUv1 was about inference only. Actually, the MAC were 8-bit only, while the Adders are (FP)32-bit
 
Maybe it is not clear, who's talking about TPUv1 and who about TPUv2?

This was built explicitly to include training

https://www.blog.google/topics/google-cloud/google-cloud-offer-tpus-machine-learning/

so perhaps you'd like to explain why it is of limited use?
That blog is about TPUv2 and specifically mentions:
While our first TPU was designed to run machine learning models quickly and efficiently—to translate a set of sentences or choose the next move in Go—those models still had to be trained separately.
 
Last edited:
Yes, Google is quite explicit that it is replacing GPUs with TPUv2 for training of its own systems. It may be that it's not a complete replacement and that will have to wait until v3 or later. With v2 offered in the cloud, I suppose we'll hear what v2's limitations are.
 
For one thing, Google's TPU has limited FP32 capabilities and no FP64 support that I know of, which makes it great for inference but of limited use for training, assuming it's even usable at all for such purposes.
FP64 support could probably come from the app using host CPUs. I'd think accumulating in FP32 and flushing to system memory for FP64 accumulation would be sufficient for deep learning. May be some HPC apps where that breaks down, but it's difficult to imagine apps with that large of a precision delta for performance critical work.
 
I don't think anyone uses FP64 for training.

What about FP32? It doesn't seem to have support for proper FP32 IEEE 754 ops, unless I missed something. But perhaps it's still usable in many cases. I'm really not knowledgeable enough about deep learning to say more.
 
Back
Top