NVIDIA Maxwell Speculation Thread

If KiSUAN is correct

Maybe in a parallel universe he would be correct, but in this universe I doubt it.

Since GM107 is Maxwell, would it be surprising that it has a different CC/SMX count than Kepler?

GTX 750 Ti appears to be a Kepler.M derivative (which makes sense given the 7xx naming and the 28nm fab. process, and would imply 192 CUDA cores per SMX). Since it would be designed "mobile first", it embodies the spirit of Maxwell (even if ARM CPU cores are not yet built-in).
 
Since GM107 is Maxwell, would it be surprising that it has a different CC/SMX count than Kepler?

Yes, it would be surprising. While Nvidia basically cut the core complexity in half with Kepler (over Fermi), all of Kepler's sku's (except GK110) exactly doubled up (quadrupled on paper) over their Fermi predecessors.
 
GTX 750 Ti appears to be a Kepler.M derivative (which makes sense given the 7xx naming and the 28nm fab. process, and would imply 192 CUDA cores per SMX). Since it would be designed "mobile first", it embodies the spirit of Maxwell (even if ARM CPU cores are not yet built-in).

Not really sure why you keep saying Kepler.M, unless you have read concrete evidence of that. If there aren't ARM cores in this chip, then I doubt there will be ARM cores in any Maxwell chip beyond the flagship die.

More than likely (or at least what I believe to be the case) the Maxwell architecture was fully designed and ready for production but 20nm either isn't ready or cost effective (likely both), so Nvidia built the least powerful Maxwell chip on 28nm to work out any potential production bugs AND to have a new (fairly powerful) low power chip to go into laptops, instead of sitting idle for 6+ additional months with no new products.
 
Not really sure why you keep saying Kepler.M, unless you have read concrete evidence of that.

Think logically about it. The purported GTX 750 Ti has exactly 5x more CUDA cores, 5x more pixel fillrate [ROP throughput], and 5x more mem. bandwidth compared to the Kepler.M GPU in Tegra K1. That cannot be mere coincidence, don't you think? These two GPU's are using the same 28nm HPM fab. process, are both very energy efficient, and would satisfy Maxwell's requirement of being designed "mobile first". Last but not least, this is a GTX 7xx GPU, not a GTX 8xx GPU. The writing is on the wall so-to-speak.
 
Last edited by a moderator:
I am worried about kepler of compute capability 3.7.
(GTX750Ti ?)

And I was able to confirm CC5.0 of maxwell, too.

xni4i1.jpg
 
compute capability 5.0 vs 3.7

I have seen references in this forum to both 3.7 and 5.0 but have been unable to find any articles (google search) as to what each offers and what 5.0 has that 3.7 doesn't.

If anyone has links please post them. Thanks.
 
Think logically about it. The purported GTX 750 Ti has exactly 5x more CUDA cores, 5x more pixel fillrate [ROP throughput], and 5x more mem. bandwidth compared to the Kepler.M GPU in Tegra K1. That cannot be mere coincidence, don't you think? These two GPU's are using the same 28nm HPM fab. process, are both very energy efficient, and would satisfy Maxwell's requirement of being designed "mobile first". Last but not least, this is a GTX 7xx GPU, not a GTX 8xx GPU. The writing is on the wall so-to-speak.
The K1 GPU is GK20A. If the 750 Ti GPU was a variant of that part, then wouldn't the name be logically of the form GK2xx instead of the GM107 that it is supposedly called?

I am worried about kepler of compute capability 3.7.
(GTX750Ti ?)
Fermi has 2.0 parts and Tesla/G80 has 1.0 parts, which are right below the corresponding names in the nvcuda.dll (although it isn't true with Kepler) so perhaps Maxwell has 3.7 parts as well as 5.0. If so then I would guess that the GM1xx parts are 3.7 while the GM2xx parts are 5.0.

Code:
__CUDA_ARCH__=500
Maxwell
__CUDA_ARCH__=370
__CUDA_ARCH__=350
__CUDA_ARCH__=300
Kepler
__CUDA_ARCH__=210
Fermi
__CUDA_ARCH__=200
__CUDA_ARCH__=130
__CUDA_ARCH__=120
__CUDA_ARCH__=110
Tesla
__CUDA_ARCH__=100
(Snippet copied from here. I'm guessing "Tesla" refers to G80 and GT200….)
 
Think logically about it. The purported GTX 750 Ti has exactly 5x more CUDA cores, 5x more pixel fillrate [ROP throughput], and 5x more mem. bandwidth compared to the Kepler.M GPU in Tegra K1. That cannot be mere coincidence, don't you think? These two GPU's are using the same 28nm HPM fab. process, are both very energy efficient, and would satisfy Maxwell's requirement of being designed "mobile first". Last but not least, this is a GTX 7xx GPU, not a GTX 8xx GPU. The writing is on the wall so-to-speak.

The core count being 5x more is, to me, insignificant data in correlation between Kepler.M and GM107. Where are you getting the 5x figures for ROP and memory bandwidth? TK1 has 4 ROPs, GK107 and GM107 both have 16 ROPs. TK1 has a 64-bit memory bus, GK107 and GM107 both have a 128-bit bus.

It's a gtx 7xx GPU because that logically makes the most sense. Nvidia never brought a gtx 750 labeled card to market obviously because they held out for this chip to take that slot. Also, Nvidia more than likely does not have any other new GPU's coming for at least 4-5 months (likely 6 or more). Branding a 28nm low-end Maxwell chip as a gtx 800 card makes no sense given it's performance, especially when it would be the only Maxwell chip out for some time, and given that there is very little headroom left in the existing Kepler lineup to increase clocks and rebrand as 800 series parts.

I think GM107 will be through and through Maxwell. I don't think there was enough left in Kepler to squeeze out 65-75% more performance vs. GK107 at very similar TDP's on the same node size (even if it's an improved version of the same basic node). That much of a perf/watt increase requires a ground up re-design, not a strip-to-the-bone Tegra GPU approach.
 
GK20A (aka project laguna) is capable of 1 Tri/2 clocks (or 0.5 Tri/clock) and 1 Z/clock. If you think that desktop Keplers have bounced back to the G80 era in terms of pure geometry throughput you might want to think again.

Other than that since when has it been established that GM107 or other Maxwell cores are manufactured on 28HPm TSMC?
 
The K1 GPU is GK20A. If the 750 Ti GPU was a variant of that part, then wouldn't the name be logically of the form GK2xx instead of the GM107 that it is supposedly called?

GK2xx (in the guise of GK208) has already been on the market for many many months now, and is not a "mobile first" design, so it would hardly make sense to call the new design GK207.
 
Now bounce back and start to think if you actually have the Maxwell unit configuration right so far ROFL :LOL:
 
Where are you getting the 5x figures for ROP and memory bandwidth? TK1 has 4 ROPs, GK107 and GM107 both have 16 ROPs. TK1 has a 64-bit memory bus, GK107 and GM107 both have a 128-bit bus.

Look carefully at the TK1 GPU specs and compare to the purported 750 Ti specs. The GPU clock operating frequencies and memory speeds are ~ 25% higher on the latter, which gives 5x more pixel fillrate and 5x more mem. bandwidth. The TK1 GPU has up to 952MHz clock operating frequency and up to 17 GB/s mem. bandwidth.

I think GM107 will be through and through Maxwell. I don't think there was enough left in Kepler to squeeze out 65-75% more performance vs. GK107 at very similar TDP's on the same node size (even if it's an improved version of the same basic node). That much of a perf/watt increase requires a ground up re-design, not a strip-to-the-bone Tegra GPU approach.

"Through and through" Maxwell would be unified virtual memory, improved IQ, built-in ARM cores, and 20nm or better fab. process, and 750 Ti is most definitely not that. The TK1 GPU is most certainly not stripped to the bone either, but rather is a true Kepler GPU in any way/shape/form (even if TMU count per SMX and ROP count per 32-bit mem. channel is halved), and unquestionably moves the bar forward with respect to perf. per watt vs. prior Kepler GPU's. GM107 obviously uses experiential, process, and architectural improvements over the last two years to significantly improve on GK107 (and make no mistake, using a mobile first approach is a significant architectural difference here).
 
Last edited by a moderator:
Look carefully at the TK1 GPU specs and compare to the purported 750 Ti specs. The GPU clock operating frequencies and memory speeds are ~ 25% higher on the latter, which gives 5x more pixel fillrate and 5x more mem. bandwidth. The TK1 GPU has up to 965MHz clock operating frequency and up to 17 GB/s mem. bandwidth.

GK20A has a maximum frequency of 951MHz in Tegra K1.

"Through and through" Maxwell would be unified virtual memory, improved IQ, built-in ARM cores, and 20nm or better fab. process, and 750 Ti is most definitely not that. The TK1 GPU is most certainly not stripped to the bone either, but rather is a true Kepler GPU in any way/shape/form (even if TMU count per SMX and ROP count per 32-bit mem. channel is halved), and unquestionably moves the bar forward with respect to perf. per watt vs. prior Kepler GPU's. GM107 obviously uses experiential, process, and architectural improvements over the last two years to significantly improve on GK107 (and make no mistake, using a mobile first approach is a significant architectural difference here).
No it is not at least not by 100%; again according to hardware.fr the GK20A is capable of only 1 Tri every 2 clocks (which is by far not a bad thing for a ULP GPU) and that's one of the spots where you simply don't need to reach as high in a ULP design. I'd be VERY surprised if Damien Triolet made such a mistake.

So you might want to remove that "way/shape/form" marketing rubbish and bounce back to planet earth.

***edit: and back to the actual topic food for thought: http://www.xtremesystems.org/forums...ting-spotted&p=5225652&viewfull=1#post5225652
 
GK20A has a maximum frequency of 951MHz in Tegra K1.

Yes, I realized that before refreshing the page and reading your post, and edited already.

No it is not at least not by 100%; again according to hardware.fr

I never said it was "100%" (as if that wasn't obvious enough with the halved TMU/ROP throughput). But at the end of the day, this is still clearly a true Kepler GPU that is not "stripped to the bone". The CUDA core count per SMX is maintained. The pixel shader perf. and tesselation perf. per SMX and per clock is maintained. The API and compute feature set is maintained.

It is very obvious that a mobile first desktop GPU derived from a mobile GPU such as Kepler.M in TK1 would not need to make all the same sacrifices in throughput that the ULP GPU would need to make. In fact, the rumored TMU throughput of 750 Ti is 11x greater than the GPU in TK1 (even though several other metrics are only 5x greater in comparison).

Look, this isn't rocket science. All I am suggesting is that the design of 750 Ti was influenced heavily by the GPU in TK1 (as opposed to some other mythical ULP NVIDIA Tegra GPU that does not yet exist). We will find out soon enough whether or not that is really the case.
 
Last edited by a moderator:
Look, this isn't rocket science. All I am suggesting is that the design of 750 Ti was influenced heavily by the GPU in TK1 (as opposed to some other mythical ULP NVIDIA Tegra GPU that does not yet exist). We will find out soon enough whether or not that is really the case.

No objection there; however they would had been forced to that route whether they are present in the ULP market or not. In any other case you cannot at least double your efficiency (at least) per Watt going from Kepler to Maxwell.

Now if there's any merit to that latest GM107 data that surfaced, it might mean more compact/smaller ALUs while at the same time having a higher SP<->TMU ratio than with Kepler. Both make absolute sense otherwise you'd end up with a gazillion redundant TMUs in a high end design and getting your ALUs more compact shortens obviously the data paths.

If GM107 is in the end 4*SIMD32 + 2 quad TMUs/SMX then yes and no for the supposed influence but it's rather a piece of cake to speculate what Tegra "M1" (if they even call it that) will look like.
 
yCzagVn.jpg


First time we get to see the reference PCB without the cooler.

I think he's mocking the guy that claims 640, it doesn't really make much difference honestly, once the final product slides are released, we'll know the CUDA cores count.
 
Back
Top