Nvidia Pascal Announcement

If anything, what's strange is that it too so long for GPUs to break this, by today's process standards, relatively pedestrian clock speed.
Except that the clock does not mean anything at all in isolation. You've also got to consider pipeline length and setup, and along with that also how many transistors have been wasted on providing broader and hence faster pipeline stages. I haven't done the math, but transistor count per CU should still be lower compared to what Prescott had, at least if you disregard the transistors spent on the ever growing caches.
 
Can anyone reliable say if the double precision of the Quadro (thinking more of the 6000-Titan models) was same or higher number of cores compared to the equivalent consumer GPU?
I assume this question may be a bit more complex as Kepler was the more DP focused design to Maxwell.

The reason for the question is just to clarify whether the extra DP cores exist on the consumer model but just disabled.
Thanks
 
Pretty sure they were laser cut ;)

They have been doing that since the G80, laser cutting functional parts so you can mod your GF to a professional card, you can sometimes get away with soft mods, but not all the features will work.
 
Nvidia doesn't advertise Maxwell-based professional SKUs with DP performance numbers, only SP. That's pretty suggestive, there's nothing disabled in the consumer models.
 
Nvidia doesn't advertise Maxwell-based professional SKUs with DP performance numbers, only SP. That's pretty suggestive, there's nothing disabled in the consumer models.
You would also need to consider Kepler as that was the last true focused DP product.
Unfortunately I do not think it is really a reliable clarification going by the fact they are the same because they are not reported in the models - sorry if misunderstanding but seems your suggesting Quadro and consumer are identical due to no advertised DP.

So do they disable the cores or is it a design without them (context being comparable die-model of Tesla/Quadro x6000/Consumer Titan)?
As I mentioned it probably has more of a trend with Kepler rather than Maxwell, where they did state DP for those Quadros.
So maybe an academic question as no trend could be taken due to divergence between the generations, and no use for working out where going with Pascal.
Thanks
 
Last edited:
I don't think there's any disabled DP ALUs in Maxwell. The transistor budget was shifted to other more "pedestrian" uses, compared to Kepler, like doubled ROPs and L2, more SMs and on-chip memory pools (LDS, GPR, TMU cache, etc.). The new DX12 features also require some dedicated logic. The only DP ALUs in Maxwell are only to provide the bare operational minimum to run and debug code and also to cover some API requirements (D3D11 with WDDM 1.2).
 
GK110 had 960 dedicated (i.e. separate) FP64 units. They were locked in Geforce cards except for the Titan-models.

All other Keplers had their FP64 units reduced at an ASIC- not a product level. Same for Maxwell, but physically reduced further to 1/32th of the number of FP32 ALUs. So, GM200 had 96 FP64 units.
 
Last edited:
http://videocardz.com/60265/nvidia-geforce-gtx-1070-3dmark-firestrike-benchmarks
GeForce-GTX-1070-3DMark-FireStrike-Performance.png
 
Nice, guess we got yet another 970-980 situation if that graph is true (and the actual game performance is comparable to that). I would recommend buying the 1070 if you want to jump to pascal or just wait for the ti, 980s dropped in price/value dramatically after the Tx/980ti were announced, while the 970 remained relatively stable.
 
Nice, guess we got yet another 970-980 situation if that graph is true (and the actual game performance is comparable to that). I would recommend buying the 1070 if you want to jump to pascal or just wait for the ti, 980s dropped in price/value dramatically after the Tx/980ti were announced, while the 970 remained relatively stable.

I'm going to wait and see how the OC versions compare. I can't really afford to wait much longer for a new GPU thanks to Oculus and the fact that my current GPU is starting to fail, so waiting for the Ti (which I couldn't afford anyway) or the subsequent knock on effect on the 1080's price isn't an option. I would hope that a decent factory oc 1070 variant could match or possibly even exceed stock 1080 performance though. If it can I'll be sorely tempted. But then it all depends on how much faster the OC 1080's are and how much they cost in comparison. In either case, I'm prepared to be a bit disappointed with my purchase when the 1080Ti lands!
 
I do not disagree. But the amount of variables makes the comparison to a Pentium 4 so dumb to begin with...

There's a too large difference between GPU's and CPU at start in the conception. And same for SOC, ok, small SOC in smartphone run at 2ghz, but whats the gpu's clock speed in them ? 450mhz ? 800mhz ? (( The question for them is maybe more tied to the power budget of the entire SOC and anyway it need be balanced with cache, ROP, TMU configuration ) and the rest of the processing power.

But seriously if we need to compare a SOC at 2.3ghz who dont hit the 1Tflops FP32 barrier vs a GPU at 8Tflops.
 
As far as I know its more efficient to have a GPU doing "many works" in a "long" amount of time than the opposite. Maybe with new materials like grapheme we could see GPUs running at 3 or 5 GHz but then again in that time CPUs could be running at 10GHz +
 
Well graphene has leakage issues, so they have to overcome that first. I don't think we will see a large change in frequency just because of the use of materials other than silicon.
 
I know about the issues, it was just an example :p. IBM has a chip(cant remember where they use it) already running at 10GHz using graphene. I also remember a experiment on the MIT where they were making a chip running at 1THz. They are also new "futuristic materials" other than graphene but it seems we will be stuck in silicone until they can't shrink it more(are we really going to have 1atom transistors?)
 
I was able to find news about IBM building a frequency mixer circuit on graphene that ran at 10GHz, which consisted of 1 transistor between two inductors and so needs 16-32 billion more before it gets to the likely budgets for big GPUs sub-10nm.

Transistors even now can switch much faster in isolation, and more so in a lab setting. It inevitably slows down once manufacturing it at scale and integrating it into complex units and interconnects come into play. Wireless circuits whose reason for existing is to twitch and not think much about it already use alternate materials or push their transistors to speeds way beyond current compute devices.

We would also need better connections between the transistors, as we've seen with 14/16nm that there's been less trouble this time shrinking and improving transistors than the thin pieces of metal between them--which have displayed poor to negative scaling for a while now.

That is a big reason why Nvidia's presentation on data movement and locality used distance traveled in mm as the primary basis of comparison for power cost versus computation. Current transistors have some aspects that worsen their switching speed, in favor of better control and leakage. The loss is masked by other barriers like the interconnect not getting any better.
 
We also need to find a way to improve the chips manufacturing process because if I recall correctly at 5nm the transistors will be made by 18 atoms. We are reaching the physical limits of silicone and unless we want to end up wanting 1 atom transistors we really need to find ways to overcome this issues and right now the frequently battles seems to be the more viable answer.

Btw does anyone know what is the current efficiency of of CPU and GPU?
 
Back
Top