Nvidia Pascal Announcement

That may be the case - if GPUs do not evolve a bit as well in the meantime. I don't know though were exactly FPGAs sit on the 3d-curve of throughput, power and configurability. On any two of them, they are pretty strong, but does that apply for the third dimension as well?
That evolution seems to be a much more integrated design. IMHO we're likely to see an evolution towards socketed GPUs or more appropriately MCM style APUs. Along the lines of P100 with its mezzanine connector or AMD's HPC Zen. Designs where system memory is far more interconnected with the GPUs. Most of this is only possible because of the stacked memory. That would also entail a move to a more coplanar system design.

FPGAs win for configurability, but likely not throughput or power. A strong example would likely be the recent AMD cards where the front end is programmable and ALUs fixed. New capabilities were designed and then backported to previous models. I don't foresee any great revolutions in designing most ALU functionality where a FPGA makes sense there. Maybe a limited FPGA unit alongside SIMDs that can be tailored to an unusual task or tying multiple fixed function units together for larger units. FPGAs will always have too many wasted transistors on interconnects for functionality that straightforward.
 
I sort of doubt that the GP100 throwing on extra FP16/FP64 capabilities constitutes that big of a differentiation on the HW side that the research for HPC features would start weighing down on the company. FP16 is bound to become standard for standard for gaming aswell.

Besides, on the GP102, nvidia does support FP16 operations in the latest OpenGL extension! (although the HW appears not up to snuff yet).

As for current features exclusive to GP100, such as HBM2, these are bound to be standard in gaming aswell once the yields improve enough (2017 Titan here we go). And once competition from AMD improves a bit (man they are behing currently....).
 
Bear in mind that FPGAs have been in use commercially for HPC for 5 years at least, now.

Let's just take a simple example:

http://www.jobg8.com/Application.aspx?3FKjld8RBQmwJZUxs4hH/Ah&Language=2057

Where's the GPU?

FPGA:s have for some time now held a tiny insignificant niche within HPC. HPC is not where the FPGA bread and butter is, it is within tiny embedded Circuits, where one would not even Dream of putting a Power sucking intel x86 CPU...

So you see, there is no "day job" (Money maker) for x86 CPU + FPGA today, and there is nothing that says that it will be there tommorrow either.
 
Actually I have seen a few 940MX with GDDR5..but I totally agree..they need to stop making it optional and separate the DDR3/4 & GDDR5 variants with different model nos, say 1020M and 1030M.
Ahh you are quite right. They are easy to miss among all the ddr3 based 940MX - with mentioning gddr5 somewhere deep down the spec sheet if at all...
There has actually just been another notebook announced with gddr5 940MX - the Xiaomi Mi Notebook Air 13.3". Just like the Surface Book it's got 1GB gddr5.
Actually, the amount of memory possible shouldn't really be an issue with a gddr5 gp108. Even skipping clamshell, 2GB should be quite sufficient for such a solution.
 
Problem with GT3e already in 14 nm is: Power is 40-50is watts for the GT alone under load.

FWIW, that's exactly what GPU-Z reports (42-46W) when I drive the GT3e to 99%.

GPU-Z reports far less power being used by a 384-core Quadro K620 for the same workload (38% of TDP).

Presumably Pascal will be even more efficient.

But is there a good description anywhere of how a GeForce card tracks TDP versus an Intel IGP?

I am skeptical that they're comparable measurements.

Interesting! Any reason why there's no 32-bit sort for the Quadro? Would it skew the diagram's scale?

I didn't have time to create a new 32-bit key CUDA build but had plenty of old results.

I'll get to it soon and update the charts.
 
Careful there, GPUs in HPC and GPUs in gaming go hand in hand, i.e. the same architectures have been reused over the years with little HW segmentation. Just like inte Xeons are able to reuse functionality from the consumer core i7 components.

The point being is that both GPUs in HPC (tesla) and Intel Xeons are driven by the consumer market, they are both just riding piggyback on the gaming / consumer segments.

This is why the Xeon Phi series has been predicted to fail in the long term, it's a very separate components from the consumer market. The same goes for the future Xeon + FPGA segment, it's completely directed towards HPC..

Not sure about that, Cray currently seeing more large contract wins for Knights Landing solution than the P100 they also sell.
And the P100 is a fair bit ahead in terms of FP64 operation performance.
Although this is just Cray.
Still one of those Knights Landing contracts for Cray is massive; Trinity supercomputer that is going live I think this year, along with several other scientific supercomputers projects
Not suggesting it is doom and gloom for Nvidia, but it is no longer so clear cut on product deployment..
Cheers
 
Not sure about that, Cray currently seeing more large contract wins for Knights Landing solution than the P100 they also sell.
And the P100 is a fair bit ahead in terms of FP64 operation performance.
Although this is just Cray.
Still one of those Knights Landing contracts for Cray is massive; Trinity supercomputer that is going live I think this year, along with several other scientific supercomputers projects
Not suggesting it is doom and gloom for Nvidia, but it is no longer so clear cut on product deployment..
Cheers

Well it seems you misunderstood my point: its not enough for Intel to dominate the accelerator HPC market with KNL (which hasnt happened yet), selling a few 100 K units per year is nowhere near enough for them to make a profit on it. The processor doesn't have a day job to pay for its existence (this was the original idea with Larrabee, to use it in the graphics market aswell as in HPC).

Meanwhile companies like Nvidia and AMD can keep pushing their GPUs into the HPC space with a much smaller overhead on their RnD because of the small HW differentiation from the consumer market. The same goes for Intel with their regular Xeon processors.

And the end of the day, HPC is just a tiny market riding piggyback on the larger consumer space.

According to Jon Peddie Research there were 450 million discrete GPUs sold in 2015 (http://jonpeddie.com/publications/whitepapers/an-analysis-of-the-gpu-market/ -Figure 4), even with much higher margins, that make HPC look like peanuts.

If I were to to also go OT: Nvidia and IBM have won the contracts to build the 2 biggest computers currently scheduled for the US (also largest officially scheduled in the US, but I'm sure the chinese are cooking something secret aswell), but this is besides the point I was making.
 
FWIW, that's exactly what GPU-Z reports (42-46W) when I drive the GT3e to 99%.

GPU-Z reports far less power being used by a 384-core Quadro K620 for the same workload (38% of TDP).

Presumably Pascal will be even more efficient.

But is there a good description anywhere of how a GeForce card tracks TDP versus an Intel IGP?

I am skeptical that they're comparable measurements.
I am not so sure about Intel, but the number's the same for GPU-z as other tools like HWINFO64 or HWMonitor report. So while I cannot say anything about it's accuracy in general, it seems that GPU-z is in line with what other tools report. And apparently by the same set of algorithms/sensors, Intel seems to determine their processor's power budgets and when to throttle (65-watt SKUs allowed up to 72 watts for a couple of dozen seconds and so on), so I am fairly confident those numbers make sense.


WRT to Geforce and Radeon, the numbers also seems the base for powertargets and powertune algorithms to determine load and they add up pretty accurately with what you can measure from the 6-/8-pin connectors and the PCIe slot combined (where applicable - RX480 for example only gives ASIC power, with older cards this wasn't even as clearly designated)


I didn't have time to create a new 32-bit key CUDA build but had plenty of old results.

I'll get to it soon and update the charts.
Looking forward to it - the picture your chart paints is already vry interesting.
 
Well it seems you misunderstood my point: its not enough for Intel to dominate the accelerator HPC market with KNL (which hasnt happened yet), selling a few 100 K units per year is nowhere near enough for them to make a profit on it. The processor doesn't have a day job to pay for its existence (this was the original idea with Larrabee, to use it in the graphics market aswell as in HPC).

Meanwhile companies like Nvidia and AMD can keep pushing their GPUs into the HPC space with a much smaller overhead on their RnD because of the small HW differentiation from the consumer market. The same goes for Intel with their regular Xeon processors.

And the end of the day, HPC is just a tiny market riding piggyback on the larger consumer space.

According to Jon Peddie Research there were 450 million discrete GPUs sold in 2015 (http://jonpeddie.com/publications/whitepapers/an-analysis-of-the-gpu-market/ -Figure 4), even with much higher margins, that make HPC look like peanuts.

If I were to to also go OT: Nvidia and IBM have won the contracts to build the 2 biggest computers currently scheduled for the US (also largest officially scheduled in the US, but I'm sure the chinese are cooking something secret aswell), but this is besides the point I was making.
How is the GP100 a small overhead for Nvidia when it is not being sold into Quadro or consumer and also using technology that is not trickling down?
And the HPC is where the future lies in terms of growth and best profit margins, albeit no-one is sure just how large the complete market is when including machine learning.

Anyway my response was to you saying:
This is why the Xeon Phi series has been predicted to fail in the long term, it's a very separate components from the consumer market. The same goes for the future Xeon + FPGA segment, it's completely directed towards HPC
Which goes against the reality when Intel are winning science contracts varying from $45m to $170m (of course that is project value not just HW sales and reported same way for Nvidia).
Yes you are right Nvidia and IBM won 2 for Oak Ridge DoE contract, but Intel won the next ones with their Knights Hill (next gen after Landing) to be delivered 2018.
Look I am a big fan of Nvidia in terms of their HPC work and mentioned quite early their 1st sales of the P100 in a contract to a CERN related lab, but as I say alternative solutions/Knights Landing-Hill should not be written off, a lot of analysts also feel Nvidia is going to be squeezed by Intel and specifically that solution of theirs you think has no future.
Intel needed an alternative to their traditional Xeons as they were not ideal for large scale,highly parallel HPC projects, also helps that it has the OPA connection and cube memory.
And it is inspired-derivative of Larrabee rather than actual Larrabee, especially in latest gen form.

Anyway as I say it is not doom and gloom for Nvidia (and of course the collaboration-partner with IBM that has great benefits), but that they have competition that looks like it could start to squeeze them in various HPC segments.
Cheers
 
Last edited:
If Nvidia see GP100 as a way to define, test and tune next.gen infrastructure (tick to Volta tock), could they write off cost as R&D? If sales cover costs, well that's ok.
That is if they implement it in their consumer product which tbh they could had done with some of it even now with the 'prosumer' GP102 (Titan+Quadro), otherwise you can say the same about Knights Landing->Knights Hill.
Agree there is overlap but that can be said for most of the companies developing products for HPC and other sectors, other aspects will always be unique to HPC and even if they find their way into other cards it is not ubiquitous so of limited use.
But some technology will always be solely HPC, such as the scaling NVLink mezzanine connector or OPA connection for Intel, or the work NVidia HPC R&D team do looking at alternative exascale efficient solutions (different to traditional GPU).
Cheers
 
And the end of the day, HPC is just a tiny market riding piggyback on the larger consumer space.
According to Jon Peddie Research there were 450 million discrete GPUs sold in 2015 (http://jonpeddie.com/publications/whitepapers/an-analysis-of-the-gpu-market/ -Figure 4), even with much higher margins, that make HPC look like peanuts.
Not peanuts at all. Not a tiny market at all. NVidia's current HPC revenue is about 20% of the gaming segment and growing. Add in the fact that profit margins are not unreasonably many multiples higher for $4000 HPC parts versus $600 gaming parts, we can conclude the profits for NVidia's HPC division are likely comparable to the gaming sector. In fact it's likely that HPC, pro graphics (with even more revenue than HPC), and gaming graphics divisions are very roughly equally profitable.

I admit this was a surprise to me too when I looked at NVidia's recent financials.
 
How is the GP100 a small overhead for Nvidia when it is not being sold into Quadro or consumer and also using technology that is not trickling down?
And the HPC is where the future lies in terms of growth and best profit margins, albeit no-one is sure just how large the complete market is when including machine learning.

Anyway my response was to you saying:
Which goes against the reality when Intel are winning science contracts varying from $45m to $170m (of course that is project value not just HW sales and reported same way for Nvidia).
Yes you are right Nvidia and IBM won 2 for Oak Ridge DoE contract, but Intel won the next ones with their Knights Hill (next gen after Landing) to be delivered 2018.
Look I am a big fan of Nvidia in terms of their HPC work and mentioned quite early their 1st sales of the P100 in a contract to a CERN related lab, but as I say alternative solutions/Knights Landing-Hill should not be written off, a lot of analysts also feel Nvidia is going to be squeezed by Intel and specifically that solution of theirs you think has no future.
Intel needed an alternative to their traditional Xeons as they were not ideal for large scale,highly parallel HPC projects, also helps that it has the OPA connection and cube memory.
And it is inspired-derivative of Larrabee rather than actual Larrabee, especially in latest gen form.

Anyway as I say it is not doom and gloom for Nvidia (and of course the collaboration-partner with IBM that has great benefits), but that they have competition that looks like it could start to squeeze them in various HPC segments.
Cheers


I see what you are saying and have been saying the same thing for a while now, the main problem with Intel is not their hardware, its the software and amount of current "presence" in the market, people that have been using nV products for HPC's, are using cuda, to port that over and to get the same performance or close to that on Intel's current products, is not a very good chance, actually 0% chance. So end result more time and more money for less capabilities so transitioning off of Cuda and nV products is probably not in anyone's best interest in the short term. In the long term its more of a possibility if nV missteps and gets delayed. Even the 6 month difference from Knights landing to Pascal, was enough for companies to see the value Intel had to offer, this is one of the reasons why Intel gained some HPC contracts this time around.

Just from a raw theoretical output (forget about the capability of the Cuda vs. ) Knights landing can't match Pascal, so Knights Hill can't come soon enough from Intel, I still thing they will be right around Pascal though even with Knights Hill.

If Intel has a lead of oh around a year over nV then they don't need to worry too much about the software side of things, but they don't seem to be bothered by pushing their own standards or even HSA standards at least not to the affect nV has been pushing Cuda.
 
I see what you are saying and have been saying the same thing for a while now, the main problem with Intel is not their hardware, its the software and amount of current "presence" in the market, people that have been using nV products for HPC's, are using cuda, to port that over and to get the same performance or close to that on Intel's current products, is not a very good chance, actually 0% chance. So end result more time and more money for less capabilities so transitioning off of Cuda and nV products is probably not in anyone's best interest in the short term. In the long term its more of a possibility if nV missteps and gets delayed. Even the 6 month difference from Knights landing to Pascal, was enough for companies to see the value Intel had to offer, this is one of the reasons why Intel gained some HPC contracts this time around.

Just from a raw theoretical output (forget about the capability of the Cuda vs. ) Knights landing can't match Pascal, so Knights Hill can't come soon enough from Intel, I still thing they will be right around Pascal though even with Knights Hill.

If Intel has a lead of oh around a year over nV then they don't need to worry too much about the software side of things, but they don't seem to be bothered by pushing their own standards or even HSA standards at least not to the affect nV has been pushing Cuda.
Nvidia still has the HPC performance crown with the P100.
Yeah Intel's software/coding base is part of their limitation and possibly also scaling until Knights Hill, and while both Nvidia and Intel have 'soft' costs for coding optimisation it could be said Intels is much higher due to the additional work for now they need to do with independent applications such as Torch/Caffe/Theano/libraries and also coding modernisation/porting to Knights Landing/Hill from a more traditional Xeon implementation (some of the large projects are in that process).
But Knights Landing is enough of an improvement over previous generation (when including software-coding improvements as well) Intel it is able to gain some traction now.
We also need to see how it all shakes out with the way Nvidia is structuring the use of P100 and GP102 in HPC space, I thought the FP16 inference optimisation branches were only added to applications this year and now this is going to be repeated (and also for clients) for Int8 if the narrative for GP102 is pushed, including how common it becomes for clients to build more differentiated dedicated ML nodes/clusters on the medium-large scale project with both P100 and GP102.
Cheers
 
Not peanuts at all. Not a tiny market at all. NVidia's current HPC revenue is about 20% of the gaming segment and growing. Add in the fact that profit margins are not unreasonably many multiples higher for $4000 HPC parts versus $600 gaming parts, we can conclude the profits for NVidia's HPC division are likely comparable to the gaming sector. In fact it's likely that HPC, pro graphics (with even more revenue than HPC), and gaming graphics divisions are very roughly equally profitable.
I admit this was a surprise to me too when I looked at NVidia's recent financials.

Thanks Steven, those numbers were about 2x above my expectations aswell. I guess it goes to show how Nvidia has been dominating the HPC accelerator market.

So in the whole of 2016, gaming revenue was a total of 2818 M $ and HPC was 339 M $, so that would constitute about 12.02 % of the gaming revenue.

This is definitely not peanuts to Nvidia as I previously stated!

Now, given that the Pascal architecture cost something like 2500 M $ in RnD* to develop, it would take the Tesla/HPC segment about 7 years to accumulate that sort of revenue (less if growing trend continues), do you think Nvidia can afford to build an architecture specifically for HPC only? No, of course not.

HPC is still riding piggyback, maybe this will change with the surge of deep learning applications? It looks to be a huge new market.


* I was told this number by nvidia architecture chief, is there a better official number?
 
Now, given that the Pascal architecture cost something like 2500 M $ in RnD* to develop, it would take the Tesla/HPC segment about 7 years to accumulate that sort of revenue (less if growing trend continues), do you think Nvidia can afford to build an architecture specifically for HPC only? No, of course not.
Well according to https://ycharts.com/companies/NVDA/r_and_d_expense this is pretty much the total R&D budget from the release of 970 and 980 onward. Considering that they also have software development, Tegra development, Volta development and that feature-wise Pascal is not that different from Maxwell, I'd say that he was inflating it just a little bit ;)
 
Well according to https://ycharts.com/companies/NVDA/r_and_d_expense this is pretty much the total R&D budget from the release of 970 and 980 onward. Considering that they also have software development, Tegra development, Volta development and that feature-wise Pascal is not that different from Maxwell, I'd say that he was inflating it just a little bit ;)

Looks to me like the RnD on average the last 3 years was around 300M / quarter or 1200 M per year. so that would be 3600 M over 3 years development, seems fairly reasonable that the Pascal could suck a good 2-2500 M.
 
Back
Top