AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

I have no idea why AMD and NVidia think they can compete with this.

Why is Google buying Volta from Nvidia? Other part would be Google doesn't sell tpus to other companies.

I think the broader question is flexibility versus throughput on specialized load. The age old fpga/DSP versus GPU versus CPU.

Intel is also trying to compete, Microsoft had/has fpga solutions, who knows what Baidu, Amazon,... Is cooking.
 
Last edited:
Why is Google buying Volta from Nvidia?
Maybe because that's what customers of Google's cloud AI service want for the time being: NVidia's platform within that service? No reason to suppose that is a long term prospect.

Why would Google use NVidia for its own internal processes, now that it has TPU V2? In other words why build Tensor Flow and TPU? Do you think it's one of those beta things that Google will abandon after a couple of years?
 
Volta customers based on this nvidia presentation: http://files.shareholder.com/downlo...-82A2-A9EA63C769C3/JHH_Shareholder2017_v4.pdf

"Nvidia GPU in every cloud, alibaba, amazon, baidu, facebook, google, microsoft, tencent"

I believe volta is about flexibility in addition to performance. Cloud service provider can pimp volta for many different use cases not just dnn inference work. It's also as much about software, usability and packaging as it's about raw performance. Just having bunch of fast chips inside a case does almost nothing.
 
Maybe because that's what customers of Google's cloud AI service want for the time being: NVidia's platform within that service? No reason to suppose that is a long term prospect.

Why would Google use NVidia for its own internal processes, now that it has TPU V2? In other words why build Tensor Flow and TPU? Do you think it's one of those beta things that Google will abandon after a couple of years?

I don't know. Google would have to offset benefit and cost of in-house solution versus buying from outside. Maybe google will eventually try to sell tpu's to other companies in order to amortize r&d cost better. I think original reason google quoted for creating tpu was that there was nothing available from anyone so they had to do their own solution.

I think the market is being divided today. It's easier to come with new solution when there is no legacy code, tools, developers community... once something get's established it's difficult to turn the tide. I see it as a very hard path for amd to compete as they are way late to the party and they are not really investing to 3rd party developers same way nvidia is(gtc conferences for example).
 
I see this as the key why customers might prefer nvidia over other solutions. There is reasonable way to use locally something cheap like laptop with 1080 or desktop with titan. Then once you are happy there is easy way to go to cloud and do the heavy crunching. Probably cloud vendors also like this approach. As you also have similar if not same hw locally as in cloud you can go and optimize your code at least on single gpu, sli gpu level and be somewhat certain your code can somehow scale/use hw in cloud efficiently. If there is no local cheap solution to develop something that seriously bogs down innovation and limits who can use your technology.

 
Nothing prevents Nvidia to design a dedicated chip should the need arise.
Everyone can do "anything", but it's competely different thing to make it competitive. We have seriously no clue how competitive NVIDIA Tensor cores are in practice or how competitive their chip built with just Tensor-cores would be
 
Everyone can do "anything", but it's competely different thing to make it competitive. We have seriously no clue how competitive NVIDIA Tensor cores are in practice or how competitive their chip built with just Tensor-cores would be

I guess you could try to extrapolate from volta. Rip out all fp32 and fp64 logic, also gimp tensor cores to output only fp16. Also get rid of the graphics related functionality. Then scale the amount of "gimped" tensor cores to form a reasonably sized chip and slap 4 of those chips per board like google tpu2 does. I don't think this makes sense for nvidia as they are trying to sell jack of all trade as opposed to google who designed solution to fit exact problem(s)(voice recognition, image recognition) they had in mind.

Nvidia btw. has already committed to a "tpu". The DLA unit that will ship in xavier is independent of volta and will be open sourced:

http://www.barrons.com/articles/nvi...driven-cars-gaming-the-big-vectors-1502407194

Finally, we’re open-sourcing the DLA, Deep Learning Accelerator – our version of a dedicated inferencing TPU – designed into our Xavier superchip for AI cars. We want to see the fastest possible adoption of AI everywhere. No one else needs to invest in building an inferencing TPU. We have one for free – designed by some of the best chip designers in the world.
https://blogs.nvidia.com/blog/2017/05/24/ai-revolution-eating-software/
 
Volta seems well within an order of magnitude of the v2 TPU in terms of peak throughput / memory bandwidth, but I would still expect them to be a fair bit behind vs. such a workload specific accelerator. Annoying that TPUv2 power numbers aren't public! NVidia was pitching a separate deep-learning accelerator (DLA) for inference at GTC17. It was claimed to be coming on Xavier, and will supposedly be open-sourced (whatever that means) in September. I would expect that to be much more competitive from a power efficiency standpoint, though I think it has a significantly smaller MAC array than Google's offering. AMD has IIRC also said something about deep-learning specific circuits in their next-gen.

Anyway, I would expect further chip area and architectural effort dedicated to machine learning from the GPU makers in the future. I agree that it's not at all certain they will succeed... there is competitive pressure coming from startups (for example: https://www.nextplatform.com/2017/08/23/first-depth-view-wave-computings-dpu-architecture-systems/) as well as the likes of Google, Microsoft, and Baidu. But I also think it's too early to say the GPU makers are in a hopeless position.
 
Everyone can do "anything", but it's completely different thing to make it competitive. We have seriously no clue how competitive NVIDIA Tensor cores are in practice or how competitive their chip built with just Tensor-cores would be
Nvidia's bread and butter is in designing power efficient chips that do a lot of floating point operations. They're rather good at it too.

A dedicated deep learning chip at Nvidia doesn't have to be based on today's tensor core either.
 
That's because Samsung hasn't released spec sheets. They're most likely 2 Gbps chips downclocked slightly, or then just 1.94 Gbps but I think that's less likely
Personally I think it's interesting that everyone assumes that the holdup is the HBM2. Why not Vega's memory controller? For that matter, given the difficulties Hynix is clearly having, why not both?
 
Personally I think it's interesting that everyone assumes that the holdup is the HBM2. Why not Vega's memory controller? For that matter, given the difficulties Hynix is clearly having, why not both?

I think some people are thinking the holdup was HBM2 because AMD was showing off a functioning Vega as early as December 2016 (which could be using Samsung memory), and SK Hynix progressively turned down the specs and pushed back the availability dates for their HBM2 throughout H1 2017 in their public catalog.

Sure it might be a wrong assumption, but is it an unreasonable one?
 
Presumably NVidia is going for software lock-in (CUDA style) with its "open source" DLA. It would appear to be recognition that you can't beat ASICs unless you have an ASIC. And then you use software lock-in to turn the screws.
 
Personally I think it's interesting that everyone assumes that the holdup is the HBM2. Why not Vega's memory controller? For that matter, given the difficulties Hynix is clearly having, why not both?
Vega's memory controller is probably perceived as being good for up to 1200 MHz, when they synopsis PR made it's rounds earlier.
http://www.prnewswire.com/news-rele...igh-performance-computing-socs-300493382.html

I have no idea why AMD and NVidia think they can compete with this.
Maybe Google does not want to sell it to other AI cloud providers?
 
So, they're basing it on just David Kanters estimation?
Hey, I would take David Kanters best guess over anyone else's facts! The man knows what he's talking about when he talks, I would count his estimates as extremely accurate.

Personally I think it's interesting that everyone assumes that the holdup is the HBM2. Why not Vega's memory controller? For that matter, given the difficulties Hynix is clearly having, why not both?
Couldn't agree more and I think we're all just guessing at this point. NOW I know where I remember you from btw, it's from here. (I said hi at the Capsacin thingy)
 
Presumably NVidia is going for software lock-in (CUDA style) with its "open source" DLA. It would appear to be recognition that you can't beat ASICs unless you have an ASIC. And then you use software lock-in to turn the screws.
I was thinking if the open source RTL/ISA is suitable for on-device tasks on say smart-phone ASICs, and is widely deployed, that could be a big deal.
 
I was thinking if the open source RTL/ISA is suitable for on-device tasks on say smart-phone ASICs, and is widely deployed, that could be a big deal.
I'm sorry, I did some rather heavy self-medicating due to a cold and a bad day. I really want to understand what you just said, but it sounded sort of gobbly-goop to me since I don't know the smart-phone world too well.

If you could, would you please take the time to dumb that down for me a bit or would someone else? It sounds like an extremely good thing I could be excited about, but I could just as easily be misreading horrible news too. :s

Thanks in advance either way. :)
 
By on-device tasks, I just mean performing things like speech recognition (Siri / Google Assistant / Cortana / Alexa) without sending audio data across the network, high quality voice synthesis, recognizing a users emotional state based on facial expressions, and so on. Some of these things, e.g. the way Google Translate can translate text in video, are already possible on smartphones, but this sort of hardware might enable significantly better quality.
 
I think some people are thinking the holdup was HBM2 because AMD was showing off a functioning Vega as early as December 2016 (which could be using Samsung memory), and SK Hynix progressively turned down the specs and pushed back the availability dates for their HBM2 throughout H1 2017 in their public catalog.

Sure it might be a wrong assumption, but is it an unreasonable one?
Public catalogs are one thing, according to those 16 GB Vega didn't exist for a long time and Samsung didn't make HBM at all 'till they suddenly said they're increasing 8-Hi chip production (to this day they still don't have HBM on their public catalogs)
 
Back
Top