NVIDIA Maxwell Speculation Thread

high end maxwell if 20nm will probably be Q4 2014

In case this is to be real, then they will need to refresh Kepler for the second time while waiting for the 20 nm...

why refresh kepler if maxwell is 28nm?

BTW, I like how nicely the article begins:

Despite my personal obsession with graphene, a miracle material that is to my disappointment going to take almost a decade to make it to mainstream chips

So, only Intel is relatively safe while all the others are in deep...
Is the difference between Intel's 22 nm and TSMC's 28 nm noticeable, does it give a big advantage?
 
In case this is to be real, then they will need to refresh Kepler for the second time while waiting for the 20 nm...



BTW, I like how nicely the article begins:



So, only Intel is relatively safe while all the others are in deep...
Is the difference between Intel's 22 nm and TSMC's 28 nm noticeable, does it give a big advantage?

Im not sure the article of Fudzilla is so much accurate, specially about Samsung...

Samsung have allready shown 20nm and under production ready ( 14-16nm 3D FinFet ). Will it be usable in 2014 for mass production ? .. not sure.. But we are not even there.

Samsung in collaboration with Synopsy have allready taped out validation SOC in 14nm 3D FinFet in november- december 2012..
 
Last edited by a moderator:
why refresh kepler if maxwell is 28nm?
In case this is to be real, then they will need to refresh Kepler for the second time while waiting for the 20 nm...
It might be that 28 nm Maxwell doesn't span the entire product lineup, especially if there is no 28 nm "big" Maxwell (which is assumed to be the case).

There are some statements (not by NVIDIA) of I'm not sure how much reliability that the Quadro K6000 actually uses a "GK180" chip instead of a GK110 chip. Whatever it uses, the K6000's specs are clearly above those of any (other) GK110 part. It has 1.48x (5196/3524) and 1.32x (5196/3935) the GFLOPS and 1.38x (288/208) and 1.15x (288/250) the bandwidth of the Tesla K20 and K20X respectively. That's a bigger increase than from the GF100 Tesla M2050 to the GF110 M2090, 1.29x the GFLOPS and 1.20x the bandwidth (or from the GTX 480 to the GTX 580), also note that the Quadro 6000 and the Tesla M2050 have almost identical specs.

Barring supply issues and assuming that a "big" Maxwell won't be here until late 2014 at the earliest, I don't see why NVIDIA wouldn't use whatever chip that goes in the K6000 for more products. If 28 nm "consumer" Maxwell chips are also DP- and compute-limited, I can see a Titan refresh coexisting with Maxwell, like this hypothetical scenario below for the GeForce lines.

Key: [Process] [Real/rumored/speculated codename] ([range of second digits for GeForce parts, existing or my estimates, the "T" is Titan or an equivalent part]).
Code:
[B]2012                  2013                  2014                  2015[/B]
                      28nm GK110 (T,8)      28nm "GK180" (T,8?)   20nm "GM110"
28nm GK104 (8-6)      28nm GK104 (7-6)      28nm "GM104" (8?-6)   20nm "GM114"
28nm GK106 (6-5)      28nm GK106 (6-4)      28nm Maxwell (6-5)    20nm Maxwell
28nm GK107 (5-3)      28nm GK107 (5-3)      28nm Maxwell (5-3)    20nm Maxwell
Now there seem to have been some rumors going around that at least one of the Maxwell chips has been canceled. If it/they are chip(s) that were planned for earlier in 2014, then NVIDIA could move each GK10x/GK110 chip "down" one row (they've already gone partway with the 700 series) for a "worst-case scenario" of
Code:
[B]2014                  OR   2014[/B]
28nm "GK180" (T,8?)        
28nm  GK110  (8?-7)        28nm "GK180" (T,8-7)
28nm  GK104  (6-5)         28nm  GK104  (6-5)
28nm  GK106  (5-3)         28nm  GK106  (5-3)
and intersections of these scenarios giving other possibilities for 2014.
 
GK180 makes no sense to me at all. If they'd do another high-end 28nm chip it should be gk2xx imho. gk208 also has only half the tmus per smx, which might be a quite worthy change for such a card (pro cards don't need lots of tmus and even for gaming it doesn't seem to hurt much) which could make room for another smx or so on its own without increasing die size.
I have to say though "speculation" that K6000 isn't using gk110 is utterly ridiculous, so there's really zero hints (I know of...) that another kepler chip is in the works.
 
So, only Intel is relatively safe while all the others are in deep...
Is the difference between Intel's 22 nm and TSMC's 28 nm noticeable, does it give a big advantage?
They're pretty massive at low voltage. Performance wise, Intel blows everything out of the water.
 
There are some statements (not by NVIDIA) of I'm not sure how much reliability that the Quadro K6000 actually uses a "GK180" chip instead of a GK110 chip. Whatever it uses, the K6000's specs are clearly above those of any (other) GK110 part. It has 1.48x (5196/3524) and 1.32x (5196/3935) the GFLOPS and 1.38x (288/208) and 1.15x (288/250) the bandwidth of the Tesla K20 and K20X respectively.

The specifications of the Quadro K6000 basically paint it as a Geforce Titan with none of the SMXs fused off.
With the price Nvidia will charge for the K6000 they can afford to very aggressively bin chips for it, with those that don't meet the grade ending up in the Titan and GTX 780.

The Tesla K20 & K20X have more conservative performance figures because their GK110 chips are binned and clocked for maximum performance/$ in HPC applications rather than performance alone.


Edit:
I was surprised to see the Quadro K6000 has an official TDP of only 225W.
Perhaps your point is valid, and it uses a significant respin of the GK110 die. Or perhaps it is still just down to binning and Nvidia do not plan to sell them in significant volumes.
 
Edit:
I was surprised to see the Quadro K6000 has an official TDP of only 225W.
Perhaps your point is valid, and it uses a significant respin of the GK110 die. Or perhaps it is still just down to binning and Nvidia do not plan to sell them in significant volumes.
Your first assumption was correct. The K6000 is a GK110.
The reason behind the 225W spec, is partially the fact that Quadro has no need for boost (902M static AFAIK for the K6000 vs. 993+M with boost for the Titan). Binning would likely take care of the rest, which would also have to take into account the 12GB of GDDR5 rather than the 6 used with the Titan.
 
got around some maxwell plans from nvidia. quite interesting stuff. there will be two lines, maybe the second one aligns with the finfet stuff, don't know. anyway, that's what i gathered:

the smx structure changes slightly. nv did some optimization that they can use now the dp alu also for sp, it supports now all sp instructions and can be used in parallel. it means an smx looks now to have 256 alu. technically, that reduces maxwell's dp rate to 1:4, but in reality it just boosts the sp performance in comparison to kepler. nv found out how to gate of the unused parts of the dp alu to keep the power down when doing sp stuff.

but the real changes are in the cache area. that will boost the efficiency big time.
first off, the registers are doubled per smx. more threads using a lot of registers can now run in parallel and better hide the latencies. and the caches got increased as well. the L1 cache also used as shared memory is now 128kb (doubled) and can be split between cache and shared memory in steps of 32/96, 64/64, or 96/32. maxwell keeps the 16 tmus per smx.

the gpcs consist of usually 3 smx, but got changed quite a bit. there is still that geometry engine and stuff, but each gpc now includes 768kb of l2 cache, backing the r/w-L1 as well as the read only texture L1 in the smxs and also serve as instruction cache for the smx. all this gets topped off with a much larger l3 cache than in kepler. now to some numbers for the first line.

gm100:
8 gpc (8 triangles per clock), 24 smx, 384 tmus, 6144 alu, 8mb l3 (and there are also 8 l2s in the gpcs!), 64 rops, 512 bit interface, up to 8 gb @ 6+ ghz
target frequency for gf 930mhz, boost 1GHz
target frequency for tesla 850mhz, gives 2.61 dp tflops, double that of kepler, comes with 16gb

gm104:
5 gpc, 15 smx, 240 tmu, 3840 alu, 4mb l3, 40 rops, 320 bit interface (7 ghz), 2.5gb for cheap models, probably a lot of asymmetric 3gb or (symmetric again) 5gb models, target 1+ ghz, can do dp only with 1:16 rate

gm106:
3 gpc, 9 smx, 144 tmu, 2304 alu, 4mb l3, 24 rops, 192 bit interface, 7ghz, 3gb ram

gm108:
2 gpc, 4 smx, 64 tmu, 1024 alu, 2mb l3, 16 rops, 128bit interface, 2 gb ram

but really interesting gets the refresh, probably waiting for tsmc's finfets. then 64 bit arm cores developed by nv gets integrated on the same die. they can coherently access the common l3 cache. the big thing is that they will be used by the graphics driver to offload some heavy lifting from the system cpu. basically most part of the driver will be running on the gpu itself! nvidia expects this will give them at least the same speed up as amd will get from mantle, but without using a new api with straight dx11 or opengl code! and it will also help with the new cuda version for maxwell, where one can access both gpu as well as cpu cores seamlessly.

the specs are planned to stay almost the same for gm110/114/116, just the 110 gets full 8 ARM v8 cores and a doubled l3 (16mb!) compared to the gm100. the finfets may also allow a further speed boost. the 8 arm core version is actually called gm110soc, so maybe nv will start to market them as standalone processors for hpc. the consumer version is likely cut down to 4 arm cores, the same as gm114 will get (which also gets a doubled l3 to 8mb). the gm116 will only get 2 cpu cores on die, i have not seen that a gm118 got mentioned..

http://pastebin.com/jm93g3YG

A post by someone claiming to know the specs of upcoming Maxwell chips. Seems plausible enough to be true. What do you all think?

Thanks Ailuros. You Germans seem to be having all the fun.
 
Last edited by a moderator:
The one thing I noticed with these specs was that taken at face value they don't give any room for any potential 28 nm Maxwell parts (due to the GM100). 28 nm versions of GM106, GM108, and even GM104 seem reasonable though, assuming the existence of 28 nm Maxwells and that the listed specs are plausible on 20 nm (which of course they may not be). The CC counts seem quite high though, although I suppose the merging of DP units makes up for some of the difference (the GK110 has 3840 SP + DP units).
 
Last edited by a moderator:
http://pastebin.com/jm93g3YG

A post by someone claiming to know the specs of upcoming Maxwell chips. Seems plausible enough to be true. What do you all think?
well if the first wave comes with 20nm by end of 2014, it may be feasible. But not sure 20nm will be ready in 2014 for such a big chip as GM100. On the other hand, I've heard about big efficiency (perf/watt) improvements with Maxwell, so maybe small models will reach the market on 28nm

other solution is to have GM104 and smaller models launch first (28 or 20nm), then later, when yields will be better, big daddy will come.

One thing is sure, NVDA cannot wait too long to launch this new generation, looks like Kepler will get his second refresh soon but it will have hard time competing with new AMD stuff...

finally, I find very strange that, up to now, we had nearly nothing on Maxwell, and suddenly, this guys gives already some insights about the refresh !!!
 
Not sure if the smx structure changes using the DP unit as a 4th SP unit work out that well, it would work for gm100 but not the others. Well it could but then the DP unit for the others would need to be quite different (that is it would need to be full-speed SP or 1/4 speed DP which is certainly not unreasonable but would be different to gm100). Using the same amount of tmus as gk1xx looks a bit suspicious too since gk208 only has half that, though if there's more sp alus per smx too I could believe that possibly.
gm104 would essentially be a "restructured" gk110 and would have better performance than the latter (and be slightly less transistors if you ignore caches and don't count the "logically unnecessary" transistors), except obviously for double precision.
Certainly doesn't sound too unreasonable. Doesn't really indicate it's true though, I could come up with something looking half-way reasonable too :).
 
other solution is to have GM104 and smaller models launch first (28 or 20nm), then later, when yields will be better, big daddy will come.
gm104 with those specs doesn't really fit on 28nm. Well it would certainly fit but it would be nearly gk110 sized, and I'm not convinced that's a viable option for an "ordinary" gaming chip.
 
Why do the Tesla K40 FLOPS numbers seem much lower than the Quadro K6000's (if they were, say, over 5.0 SP TFLOPS then I think they would have just said that instead of saying over 4.0)? Is it because the K40 has boost? I also wonder if the peak FLOPS numbers for the K40 are with or without boost.
 
Why do the Tesla K40 FLOPS numbers seem much lower than the Quadro K6000's (if they were, say, over 5.0 SP TFLOPS then I think they would have just said that instead of saying over 4.0)? Is it because the K40 has boost? I also wonder if the peak FLOPS numbers for the K40 are with or without boost.

Quadro doesn't do DP (to my knowledge), Tesla does. Activating the dedicated units for DP leads to higher energy consumption, hence lower FLOPs at the same TDP. Tesla parts are lower clocked than Quadro parts and Titan also clocks lower once you activate DP.
 
Quadro doesn't do DP (to my knowledge), Tesla does. Activating the dedicated units for DP leads to higher energy consumption, hence lower FLOPs at the same TDP. Tesla parts are lower clocked than Quadro parts and Titan also clocks lower once you activate DP.

Quadro K6000 have the same DP rate as Tesla K20x.
 
Back
Top