NVIDIA Maxwell Speculation Thread

First time we get to see the reference PCB without the cooler.

I think he's mocking the guy that claims 640, it doesn't really make much difference honestly, once the final product slides are released, we'll know the CUDA cores count.

960 vs 640SPs don't make much difference?

If it is, can we stop the babbling about GM107 being a Kepler derivative? Is the change in the number of CUDA cores per SMX enough to call it a Maxwell now?

Exactly why I consider a 640 config heaven sent; now we'll of course see theories in the wild that they cut out SPs last minute :p
 
If it is, can we stop the babbling about GM107 being a Kepler derivative? Is the change in the number of CUDA cores per SMX enough to call it a Maxwell now?
Probably not, after all, the GF100 and GF104 have different numbers of CCs per SM.

[Clarification: I think the GM107 a legitimate Maxwell architecture chip. I don't think that those who believe the GM107 is a Kepler refresh will change their mind even if the GM107 ends up having 640 CCs, for the reason above.]

Also, PedantOne in the XS thread has posted some purported pictures including one showing Hynix H5GC4H24MFR-T2C memory. According to this data sheet the memory could be rated 6.0 Gbps or 5.0 Gbps, which doesn't really tell us much (but is good to know).
 
Last edited by a moderator:
So let me get this right: someone else had to challenge the bloke in order to check the supposed relevant documentation and find out that there are all of the sudden 33% less SPs in the real chip? *raises eyebrow*
 
960 vs 640SPs don't make much difference?



Exactly why I consider a 640 config heaven sent; now we'll of course see theories in the wild that they cut out SPs last minute :p

LOL!

:runaway:

Of course 960 vs 640 SPs makes a difference... Given the alleged performance of these 640 SPs, faster than the 768 SPs GTX650Ti, they increased performance per SP. Although not by much? We still need to know how much power consumption it consumes exactly though...
 
Look carefully at the TK1 GPU specs and compare to the purported 750 Ti specs. The GPU clock operating frequencies and memory speeds are ~ 25% higher on the latter, which gives 5x more pixel fillrate and 5x more mem. bandwidth. The TK1 GPU has up to 952MHz clock operating frequency and up to 17 GB/s mem. bandwidth.

Could have sworn that I've read from multiple reputable sites that TK1's GPU is "up to 1ghz." And after you move past the memory bus, which is 1/2 GM107 and not 1/5, memory bandwidth is entirely based on the power constraints of the ram. I think you're trying to create correlations to match up product features, when there really is no real correlation.

ams;1826766" said:
Through and through" Maxwell would be unified virtual memory, improved IQ, built-in ARM cores, and 20nm or better fab. process, and 750 Ti is most definitely not that.

Nvidia bifurcated Kepler more so than they did with Fermi. GK104 and it's derivatives were more graphics focused than GF104 and it's derivatives and stripped of more HPC-oriented functionality. Consequently, GK104 and GK106 had a die size reduction over their predecessors. GK110 brought new, exclusive compute features with it, and as a result increased in die size over it's predecessor. From what I'm deducing, Nvidia is continuing this strategy. I have no idea if Nvidia plans to implement ARM cores into anything other than the flagship Maxwell die, but I am fairly confident they will continue to bifurcate their product line, like they started doing with Fermi and even more so with Kepler.

ams;1826766" said:
The TK1 GPU is most certainly not stripped to the bone either, but rather is a true Kepler GPU in any way/shape/form (even if TMU count per SMX and ROP count per 32-bit mem. channel is halved), and unquestionably moves the bar forward with respect to perf. per watt vs. prior Kepler GPU's. GM107 obviously uses experiential, process, and architectural improvements over the last two years to significantly improve on GK107 (and make no mistake, using a mobile first approach is a significant architectural difference here).

Okay it's not stripped to the bone, but it's definitely stripped down.
Anand Li Shimpi says:
NVIDIA did some work to make Kepler suitable for low power, but it's my understanding that the underlying architecture isn't vastly different from what we have in notebooks and desktops today. Mobile Kepler retains all of the graphics features as its bigger counterparts, although I'm guessing things like FP64 CUDA cores are gone.

I'm not saying you are definitley wrong, I just don't think you are right. :p I'm standing by theory that Nvidia was ready with Maxwell significantly before TSMC was able to deliver 20nm at reasonable costs, so Nvidia
 
True, but then again all Kepler had this number constant throughout the family including GK208A. Why would they changed it just for one chip?
GM107 and GM108 are no Keplers. If they have compute capability 5.0 there are probably some radical changes (4.x is missing) in the architecture.
 
Also, PedantOne in the XS thread has posted some purported pictures including one showing Hynix H5GC4H24MFR-T2C memory. According to this data sheet the memory could be rated 6.0 Gbps or 5.0 Gbps, which doesn't really tell us much (but is good to know).

It is the same memory but runs at different speeds depending on the supply voltage.

1.5V = 6.0Gbps

or

1.35V = 5.0Gbps

Factory overclocked boards will probably run at 1.5V
 
Nvidia bifurcated Kepler more so than they did with Fermi. GK104 and it's derivatives were more graphics focused than GF104 and it's derivatives and stripped of more HPC-oriented functionality. Consequently, GK104 and GK106 had a die size reduction over their predecessors. GK110 brought new, exclusive compute features with it, and as a result increased in die size over it's predecessor. From what I'm deducing, Nvidia is continuing this strategy. I have no idea if Nvidia plans to implement ARM cores into anything other than the flagship Maxwell die, but I am fairly confident they will continue to bifurcate their product line, like they started doing with Fermi and even more so with Kepler.
The "roadmap feature" of Kepler, dynamic parallelism, didn't even show up until GK110. I wouldn't be surprised if GM107/GM108 have "significantly" fewer features than the GM200 (the "through and through" Maxwell?) or even the other GM20x chips, but that doesn't make the GM10x chips "just" a Kepler refresh.
 
LOL!

:runaway:

Of course 960 vs 640 SPs makes a difference... Given the alleged performance of these 640 SPs, faster than the 768 SPs GTX650Ti, they increased performance per SP. Although not by much? We still need to know how much power consumption it consumes exactly though...

We all know that what really counts is what comes out at the other end and that any unit != any unit.

It's just strange that those in the "know" so far didn't notice that there's something different.
 
Could have sworn that I've read from multiple reputable sites that TK1's GPU is "up to 1ghz." And after you move past the memory bus, which is 1/2 GM107 and not 1/5, memory bandwidth is entirely based on the power constraints of the ram. I think you're trying to create correlations to match up product features, when there really is no real correlation.

The math is what it is. Based on the leaked rumors, 750 Ti has 5x more CUDA cores, 5x more pixel fill rate [ROP throughput], and 5x more mem. bandwidth compared to the TK1 GPU, period. That is undeniable. Unit counts obviously don't need to increase by 5x across the board to achieve this, nor would it be area efficient to do so. Now, if you think that the leaked specs are wrong, you are entitled to your opinion, but that is all we have to analyze at the moment.

FWIW, the TK1 GPU has been specified at up to 365 GFLOPS throughput which would imply up to 951-952MHz GPU clock operating frequency.

Okay it's not stripped to the bone, but it's definitely stripped down.

Most GPU's are "stripped down" in one form or another. On a fundamental level, the TK1 GPU is a Kepler GPU, and halving TMU count per SMX and halving ROP count per 32-bit mem. channel doesn't change that. For it's power envelope, the TK1 GPU would be just as well balanced as the purported GTX 750 Ti.

I'm not saying you are definitley wrong, I just don't think you are right. :p I'm standing by theory that Nvidia was ready with Maxwell significantly before TSMC was able to deliver 20nm at reasonable costs

My comments are related to the purported leaked specs only. If the leaked specs are incorrect, then obviously anything goes.
 
Last edited by a moderator:
Well looks like there is mounting evidence that the smx is considerably different, and configured with 128 CC's instead of 192. I am going with Maxwell. ;)
 
Didn't anandtech have a rumor that part of Maxwell was that dp could also be used for sp instructions?

http://forums.anandtech.com/showthread.php?t=2346062
the smx structure changes slightly. nv did some optimization that they can use now the dp alu also for sp, it supports now all sp instructions and can be used in parallel. it means an smx looks now to have 256 alu. technically, that reduces maxwell's dp rate to 1:4

Maybe they removed a third of the sps, so there's 128 normal sp alus, and 64 dp alus which can be used as sp alus? They could also choose to bifurcate and not include dp at all. Does the rumored performance align with 640 alus?
 
Since Fermi, all GPUs have 64bit units, just enough to debug and run 64bit code. Since Nvidia wants to sell a lot of Teslas, they made it so you can run and debug 64bit code on GeForce with capped performance of course.
 
Well looks like there is mounting evidence that the smx is considerably different, and configured with 128 CC's instead of 192. I am going with Maxwell. ;)

If that is truly the case, then I would be really surprised, because I didn't expect any heavily rearchitected Maxwell GPU's to be announced until GTC 2014 at the end of March. Note that each rearchitected CUDA core for 750 Ti would need to be capable of much more work (at least 50% more) than a Kepler CUDA core in order for that GPU to be a worthy successor to 650 Ti.
 
Last edited by a moderator:
Since Fermi, all GPUs have 64bit units, just enough to debug and run 64bit code. Since Nvidia wants to sell a lot of Teslas, they made it so you can run and debug 64bit code on GeForce with capped performance of course.

Makes sense. What if the difference between Tesla and commodity is Denver? If the area difference for 64 dp vs sp alus per smx is small enough, there'd be no reason to castrate 64bit performance. Aren't they at a competitive disadvantage to amd as things currently stand (wrt dp)?

Note that each rearchitected CUDA core for 750 Ti would need to be capable of much more work (at least 50% more) than a Kepler CUDA core...

50% more? Huh. ;^/
 
Last edited by a moderator:
Nvidia doesn't really care, Quadro and Tesla get uncapped 64bit performance since they have high margins, GeForce gets capped 64bit performance, TITAN being an exception.

This of course only affects parallel 64bit compute performance, I don't think the Denver core does much for 64bit performance since CPUs do serial workloads best, GPUs parallel workloads.
 
Last edited by a moderator:
Back
Top