NVIDIA Maxwell Speculation Thread

I get what you are saying, and I don't disagree at all. But given Maxwell's efficiency, and assuming that efficiency is maintained as performance scales up with bigger chips, there is an awful lot of headroom left to work with even within the die size / transistor density constraints of 28nm. Case in point, GM107 is obviously headroom capped purposefully by Nvidia. They could have easily slapped it with 7ghz vram and shipped out 1275mhz boost clocked chips, attaining or surpassing GTX 660 performance. (And I believe they may still do that in a future 800 series rebadge of GM107).

Roughly around +50% for the top dog isn't exactly an awful lot of headroom in my book, but that's just me.

[strike]On another note considering that table above it's as close as it can be for GM200 and complete nonsense for GK110. A GTX780Ti gives 5.04 TFLOPs FP32, which at 250W TDP is exactly 20.16 FLOPs/W (and no not with the turbo frequency). If you'd go up to the hypothetical 25 FLOPs/W for GM200 at again a 250W TDP you get 6.25 TFLOPs or else 24% more TFLOPs[/strike] ***edit those numbers are for correlation. Add another 35% higher efficiency for Maxwell and you're at +64% best case scenario. Now think of an average in existing applications and how much custom vendor SKUs really could push the power envelope beyond that, also considering that there's a good chance that transistor density has increased in GM200 vs. GK110.
 
Roughly around +50% for the top dog isn't exactly an awful lot of headroom in my book, but that's just me.

If GM200 maintains the same leap in perf/watt the GM107 has, and it's only 50% faster than GK110, then it's got to have the lower accompanying power consumption to maintain that same perf/watt. It's such a weird situation to have a significantly more efficient architecture on the same node while being limited significantly to what can be built by the node process.
 
If GM200 maintains the same leap in perf/watt the GM107 has, and it's only 50% faster than GK110, then it's got to have the lower accompanying power consumption to maintain that same perf/watt. It's such a weird situation to have a significantly more efficient architecture on the same node while being limited significantly to what can be built by the node process.

Actually GM107's perf/watt is just around 50% or little under better than GK110's, so 50% faster than GK110 would fit same consumption. TBH "only 50% faster" seems like a pipe dream considering the same process, even if the consumption goes down we don't know how the architecture would deal with high enough clocks to make the difference, and GK110 already being as big as it is there's not much room to go bigger either
 
Actually GM107's perf/watt is just around 50% or little under better than GK110's, so 50% faster than GK110 would fit same consumption. TBH "only 50% faster" seems like a pipe dream considering the same process, even if the consumption goes down we don't know how the architecture would deal with high enough clocks to make the difference, and GK110 already being as big as it is there's not much room to go bigger either

Not sure where you are getting your info or why you would make that comparison. TPU shows 750 TI being 60% more efficient, but comparing Kepler's HPC die with GM107 instead of the chip GM107 succeeds makes very little sense. That's like comparing GK107 to GF110. Again, doesn't make sense.
 
Yeah that's a pretty meaningless comparison. However TPU shows the 750 Ti with 55% better perf/w than the 650 Ti.

Also impressive is the 20% advantage over the 650 Ti with almost identical bandwidth and flops so architectural efficiency is a big part of it too.
 

Which would put the reticle limit for TSMC and probably GlobalFoundries at approximately 858mm².

I know everyone can compute that on their own but when I Google "TSMC reticle limit" I currently get a Beyond3D post written by me and containing incorrect information, so I'd like to change that! :D
 
Yeah that's a pretty meaningless comparison. However TPU shows the 750 Ti with 55% better perf/w than the 650 Ti.

Also impressive is the 20% advantage over the 650 Ti with almost identical bandwidth and flops so architectural efficiency is a big part of it too.

Even that comparison only makes sense because they're of the same product name x50 TI. The gtx 650 TI was made from GK106. GM107 specifically replaced GK107 in the chip product stack, just like GM200 will replace GK110. Therefore the most useful comparisons are between GM107 and GK107.
 
Which would put the reticle limit for TSMC and probably GlobalFoundries at approximately 858mm².

I know everyone can compute that on their own but when I Google "TSMC reticle limit" I currently get a Beyond3D post written by me and containing incorrect information, so I'd like to change that! :D
:LOL:
 
Even that comparison only makes sense because they're of the same product name x50 TI. The gtx 650 TI was made from GK106. GM107 specifically replaced GK107 in the chip product stack, just like GM200 will replace GK110. Therefore the most useful comparisons are between GM107 and GK107.


Actually it had nothing to do with the name. I made the comparison because both the 650 Ti and 750 Ti are rocking 86GB/s bandwidth and 1.4Tflops so it's as close to apples-to-apples as you can get. Names are particularly irrelevant when talking about architecture - that's purely a marketing decision.

http://techreport.com/review/26050/nvidia-geforce-gtx-750-ti-maxwell-graphics-processor/3
 
Actually it had nothing to do with the name. I made the comparison because both the 650 Ti and 750 Ti are rocking 86GB/s bandwidth and 1.4Tflops so it's as close to apples-to-apples as you can get. Names are particularly irrelevant when talking about architecture - that's purely a marketing decision.

http://techreport.com/review/26050/nvidia-geforce-gtx-750-ti-maxwell-graphics-processor/3

Then again, is it a fair comparison on chips power efficiency and all, when one is sporting partly disabled chip and one isn't? Who knows how much more efficient 650 Ti might have been had the chip been built for it's needs, rather than it's bigger brothers?
 
Then again, is it a fair comparison on chips power efficiency and all, when one is sporting partly disabled chip and one isn't? Who knows how much more efficient 650 Ti might have been had the chip been built for it's needs, rather than it's bigger brothers?

Exactly. GK107 and GM107 both have the same built in number of ROP's, memory controllers, and both chips are the bottom dwelling interations of their successive families (not counting GK108, which was created primarily as a Tegra test bed platform).
 
Exactly. GK107 and GM107 both have the same built in number of ROP's, memory controllers, and both chips are the bottom dwelling interations of their successive families (not counting GK108, which was created primarily as a Tegra test bed platform).
I guess you meant GK208... I wouldn't say it was mostly meant as a Tegra test bed platform, though most likely gk208/gk20a/gmxxx share quite some of the reasons why they are more area and power efficient than gk1xx (well one factor for area efficiency is not wasting area on too many tmus...)
Also you apparently forgot about GM108 which actually seems to be quite popular in notebooks (now that is the rock bottom chip with nothing but 64bit ddr3 - the performance it gets out of that no-bandwidth memory is actually quite impressive though it's still sort of pointless).
GK107 was IMHO not a very efficient member of the gk1xx family because it was unbalanced (for the gddr5 versions at least) - really lacking that 3rd SMX. So a bit of the ground gm107 makes up against gk107 is simply due to "fixing" that (pretty sure a 3 SMX gk107 would have been more area and more power efficient by quite a bit).
 
Then again, is it a fair comparison on chips power efficiency and all, when one is sporting partly disabled chip and one isn't? Who knows how much more efficient 650 Ti might have been had the chip been built for it's needs, rather than it's bigger brothers?


Yep that's true which is why I referred to architectural efficiency when mentioning the 20% perf bump. That's 20% faster with the same theoretical maximums. In any case there will never be a perfect comparison as there are too many variables. The name of the part is definitely not relevant though.

GK107 was IMHO not a very efficient member of the gk1xx family because it was unbalanced (for the gddr5 versions at least) - really lacking that 3rd SMX. So a bit of the ground gm107 makes up against gk107 is simply due to "fixing" that (pretty sure a 3 SMX gk107 would have been more area and more power efficient by quite a bit).


Good point. Balance is another important factor when trying to compare different chips/SKUs.

Btw - wth is GM108!? Haven't heard about that one before.
 
The future-nvidia-gpu speculative parts in the paper don't seem to contain any insider info, just conjecture based on the GTC slideware...

This. Anything else would be job-suicide for an employee of any involved company whose secrets get out this way.
 
It's a 3 SMM 64-bit Maxwell chip which I think showed up around the same time as GM107.
Notebookcheck has listed quite some Notebooks with it (GT840M specifically). For a chip which is available for quite a while it is indeed still quite a mystery. Die size, transistor count? Make your guess (mine would be very slightly above gk208 for size). Would it even support gddr5? No idea.
 
Back
Top