What's a report from 2012 have to do with Tegra K1? I'm kinda confused here....
And what is the performance?extrajudicial said:t's not just "clock gating" and how do you explain the fact that Kepler and Maxwell have the EXACT same power consumption on compute loads
Do they?It's not just "clock gating" and how do you explain the fact that Kepler and Maxwell have the EXACT same power consumption on compute loads?
No. My explanation was: all known architectural changes to Maxwell are exactly the kind of thing you'd do if the low hanging clock gating fruit has already been consumed: reducing expensive operations, reducing data movement.Your explanation is that "the architecture is different" ... Ok!
Can you provide a link to the CUDA thread you're referring to? I don't remember seeing one recently.From the CUDA thread, it's clear that they added some kind of register reuse cache that reduced register fetches from the register banks. Banks are pretty large, so that should result in quite an optimization. And if the register reuse cache is much closer to the ALUs, they will lose less power moving the operands around as well. And then there's reduced HW scheduling and the reduced crossbar not allowing operands to execute everywhere.
I can't find where we talked about it here (ok, on my phone and I didn't really look), but it's all about this: https://code.google.com/p/maxas/source/browse/Can you provide a link to the CUDA thread you're referring to? I don't remember seeing one recently.
Also, even if compute saw less improvement in perf/W, so, uhh, watt? I'd call that an interesting datapoint, worthy of discussion about how such a workload could trigger difference HW paths. Not something sinister.
But without having seen any power numbers for pure compute workloads, that's all academic anyway.
I'm not saying that the architecture isn't more efficient, but its efficiency is only really easy to evaluate when it's under non-compute loads. When maxwell is forced to compute, it's pulling easily 300W. Still 20-30W less than a 680/770 but that's not much considering the 120W difference in TDP.
Nvidia has done some really amazing work with clockspeed and power gating. And their work brings legit benefit to GPU efficiency When it renders games. As soon as GM204 switches to compute, it becomes just as big a hog as Kepler.
I'm just wondering how much mobile GPU loads will represent game vs compute
Wow you are a very distasteful person to have to educate, but that's ok
www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/21
Scroll down to the third benchmark (Load Power Consumption - FurMark)
The 2014 GTX 980 drew 294W to the 680s 314. Thats two years of nvidias best work at efficiency. Ten Watts
I would appreciate an apology for your rudeness
Difference in TDP between the GTX980 and GTX680 is only 30 Watts, not 120. That said, I'm not sure what the use in comparing TDP is, you're better off comparing actual power consumption.
Now let's see, Furmark pretty much is a power virus, which means it's hard to find a workload with higher power consumption. Power consumption in Crysis 3 for example would be more realistic for games at least, but that also taxes the CPU more. In both, a GTX980 consumes roughly less than or equal the amount of power as a GTX680 or GTX770, while at the same time performing between 50% to 200% better in compute workloads (from the AnandTech Compute benches). Now, can you explain to me how that does not indicate a major improvement in perf/Watt? From what I'm seeing here, saying that it's two times as efficient as Kepler wouldn't be too unrealistic.
That said, while I believe my reasoning is pretty sound, we haven't seen any actual power consumption figures of compute workloads on GM204. That's also why I'm careful with making any definite statements regarding compute power efficiency. But on the other hand, isn't Furmark mostly a compute workload? Or doesn't it, for example, tax the memory subsystem enough or something?
In the end, you're still coming back to your clockspeed and power gating argument, which to me seems to indicate that you're not really listening to what others have to say.
You are still confused. Power consumption is only half of the efficiency equation. A GTX980 may be using 90% of the power of a GTX770 in some cases, but it is also providing much, much better performance.extrajudicial said:Load up any compute task intensive enough to fully load the gpus and maxwell's efficiency advantage drops to <10%. That doesn't mean it won't be generally much more efficient, it will be because few workloads will fully load the gpu like intensive compute.
I'm on mobile and am having a hard time linking all the reviews. The consensus is that yes it's very fast but very buggy. Lots of crashing and updates.
I'm not saying that the architecture isn't more efficient, but its efficiency is only really easy to evaluate when it's under non-compute loads. When maxwell is forced to compute, it's pulling easily 300W.
That's a gaming load! I think we can all agree that crisis 3 is a game, right?
This isn't tough to test on your own. Load up any compute task intensive enough to fully load the gpus and maxwell's efficiency advantage drops to <10%. That doesn't mean it won't be generally much more efficient, it will be because few workloads will fully load the gpu like intensive compute.
Maxwells efficiency advantage is very big under gaming loads on x86, but ARM has different competition. PowerVR has demonstrated they can match and beat nvidia gpus (tegra3/4) with their own designs, and that's because ARM is a different beast.
For no credible reason (you have yet to provide one anyway).extrajudicial said:I am discounting maxwells efficiency overall
Indeed, it is not magic; it is quality engineering.extrajudicial said:I'm just saying there is nothing magic about the architecture that makes more efficient.
You have provided zero actual evidence that this is the case.extrajudicial said:It's is 100% power gating and extremely fine clockspeed management.
You have also provided zero evidence that Maxwell is less efficient for compute workloads.extrajudicial said:But it won't work for compute.
I have given you three very concrete examples of architecture changes that undeniably are more power efficient:I'm just saying there is nothing magic about the architecture that makes more efficient. It's is 100% power gating and extremely fine clockspeed management.