NVIDIA Maxwell Speculation Thread

And finally...some news on GM200. Seems like there's been a lot of smoke and mirrors stuff going on. What I have again heard is that it is still on 28nm as planned..and should be taping out late this month/early next month.

Just so I understand GM200 would be the highest end part? and would it even be slated for a consumer GPU at first?

With GM107 out, I'm assuming there's an equivalent tier to what GK104 was, coming out before GM200 (hopefully in a GTX 870).

I looking forward to upgrading my GTX670, but so far there's nothing on the market worth upgrading to (for a reasonable price)
 
And finally...some news on GM200. Seems like there's been a lot of smoke and mirrors stuff going on. What I have again heard is that it is still on 28nm as planned..and should be taping out late this month/early next month.
You sure you're not mixing that with GK210?
 
Interesting article on EE Times - http://www.eetimes.com/author.asp?section_id=36&doc_id=1322399

Some data which is very relevant to the discussion above:-

  1. Cost of a 28nm wafer - $4,500-$5,000
  2. Cost of a 20nm wafer - $6,000
  3. Cost of a 16/14nm Finfet wafer - $7,270
There's a lot of other good info..worth a read. Some more key points mentioned were:-

  1. TSMC's 20nm capacity is expected to be 60,000 Wafers per month in Q4.
  2. A number of fabless companies will tape out their 16/14 FinFET product designs in the third quarter of 2014 with high-volume production planned for the second or third quarter of 2015.


And finally...some news on GM200. Seems like there's been a lot of smoke and mirrors stuff going on. What I have again heard is that it is still on 28nm as planned..and should be taping out late this month/early next month.

With 20nm offering "up to 1.9x transistor density" vs. 28nm, those costs actually make it look really beneficial to transition away from 28nm asap. Then again, yields might be terrible at first... but at similar yields, a company that can effectively shrink down their chip and get excellent transistor density scaling would be getting 75%+ more chips per wafer.
 
I think the real issue is that when 20nm becomes as cheap as 28nm per working transistor, 16nm may be readily available. At that point, why bother?
 
With 20nm offering "up to 1.9x transistor density" vs. 28nm, those costs actually make it look really beneficial to transition away from 28nm asap. Then again, yields might be terrible at first... but at similar yields, a company that can effectively shrink down their chip and get excellent transistor density scaling would be getting 75%+ more chips per wafer.
Don't forget the higher costs for the masks.
 
So now it absolutely will be on 28nm? Did Nvidia find that the density decrease wasn't justifiable enough to be on 20nmSoC?

Apparently it was always intended to be on 28nm..the 20nm rumours were just that..rumours. If so..this would be a very interesting chip IMHO..they are bound by the reticle limit and probably would not be able to increase die size much beyond GK110. Given that they're on the same process, a GM200 v/s GK110 comparison would tell us exactly how good the architectural improvements are.
Just so I understand GM200 would be the highest end part? and would it even be slated for a consumer GPU at first?

With GM107 out, I'm assuming there's an equivalent tier to what GK104 was, coming out before GM200 (hopefully in a GTX 870).

I looking forward to upgrading my GTX670, but so far there's nothing on the market worth upgrading to (for a reasonable price)

Yes..GM200 would be the highest end part until Pascal. No idea but this time around there would be plenty of 28nm capacity available so they shouldn't be short of chips I would think.

Yes..GM204..please read the last few pages of this thread.
You sure you're not mixing that with GK210?

What is GK210?
With 20nm offering "up to 1.9x transistor density" vs. 28nm, those costs actually make it look really beneficial to transition away from 28nm asap. Then again, yields might be terrible at first....

Only beneficial if you absolutely need higher density..not if you want lower cost... Yes..given initial yields..cost per transistor would be higher in the beginning. This slide by NV should give you a fair idea -

uzeIcoD.jpg


but at similar yields, a company that can effectively shrink down their chip and get excellent transistor density scaling would be getting 75%+ more chips per wafer.

Assuming they stick to the same transistor count of course. Traditionally..transistor count goes up every generation and the die size remains around the same. Eg..GF104 had 1.95B transistors on a 330mm2 die whereas GK104 had 3.5B transistors on a 294 mm2 die.

I think what you're overlooking is that cost/mm2 goes up with every node.
 
Apparently it was always intended to be on 28nm..the 20nm rumours were just that..rumours. If so..this would be a very interesting chip IMHO..they are bound by the reticle limit and probably would not be able to increase die size much beyond GK110. Given that they're on the same process, a GM200 v/s GK110 comparison would tell us exactly how good the architectural improvements are.

Unless they went for some sort of insane transisor density there's not much more than roughly 4x times the GM107 transistor count you can theoretically fit into ~550mm2.
 
Apparently it was always intended to be on 28nm..the 20nm rumours were just that..rumours. If so..this would be a very interesting chip IMHO..they are bound by the reticle limit and probably would not be able to increase die size much beyond GK110. Given that they're on the same process, a GM200 v/s GK110 comparison would tell us exactly how good the architectural improvements are./
I'm guessing the memory bit bus width will still be 384-bit wide on the GM200 since it is on 28nm.

Yes..GM200 would be the highest end part until Pascal. No idea but this time around there would be plenty of 28nm capacity available so they shouldn't be short of chips I would think.
I'm guessing this means that Big Pascal (GP100?), Performance Pascal (GP104?) will be around 2H 2016 on TSMC's 16nmFF+.

Only beneficial if you absolutely need higher density..not if you want lower cost... Yes..given initial yields..cost per transistor would be higher in the beginning. This slide by NV should give you a fair idea -

Assuming they stick to the same transistor count of course. Traditionally..transistor count goes up every generation and the die size remains around the same. Eg..GF104 had 1.95B transistors on a 330mm2 die whereas GK104 had 3.5B transistors on a 294 mm2 die.

I think what you're overlooking is that cost/mm2 goes up with every node.
Is it possible to increase transistor density without process changes?
 
Unless they went for some sort of insane transisor density there's not much more than roughly 4x times the GM107 transistor count you can theoretically fit into ~550mm2.

Yep I get 4.35x if I extrapolate the numbers from GK107-GK110 to GM107-GM200 (Assuming the same 551 mm2 die).

Numbers: GK107 - 11.02 million/mm2, GK110 - 12.89 million/mm2, i.e. a increase of 16.9%. GM107 is 12.63 million/mm2 so scaling by 16.9%, GM200 would be 14.78 million/mm2. At 551 mm2, it comes up to about 8.14B transistors.

Regarding performance, assuming that GM107 performs 78% better than GK107 (source:Techpowerup - http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_750_Ti/25.html), I get roughly 40% higher performance for GM200 over GK110.

I know scaling is not always linear but in the case of GK107 to GK110 it seems to be extremely linear. From the same review, GK110's performance is 4.66 times that of GK107..and we know that the die size is 4.67 times that of GK107. Very simplistic analysis I know..but gives us a ballpark.
I'm guessing the memory bit bus width will still be 384-bit wide on the GM200 since it is on 28nm.
Should be I suppose. Maxwell is also more bandwidth efficient. However, 512-bit would allow them to put more memory on the cards..which matters in the professional/HPC segment. AMD's Firepro W9100 with 16 GB comes to mind.
I'm guessing this means that Big Pascal (GP100?), Performance Pascal (GP104?) will be around 2H 2016 on TSMC's 16nmFF+.
Why on 16FF+? Should be out on 16FF in 2H 2015 I hope.
Is it possible to increase transistor density without process changes?
Yes..as mentioned..look at GM107. While there is a 25% increase in die size from GK107 to GM107, the transistor count increased by 44%. As a result, sensity has increased substantially from 11m/mm2 to 12.64m/mm2. However, this would also partly be attributed to the huge increase in L2 cache from 256 KB to 2 MB.
 
Last edited by a moderator:
Yep I get 4.35x if I extrapolate the numbers from GK107-GK110 to GM107-GM200 (Assuming the same 551 mm2 die).

Numbers: GK107 - 11.02 million/mm2, GK110 - 12.89 million/mm2, i.e. a increase of 16.9%. GM107 is 12.63 million/mm2 so scaling by 16.9%, GM200 would be 14.78 million/mm2. At 551 mm2, it comes up to about 8.14B transistors.

Regarding performance, assuming that GM107 performs 78% better than GK107 (source:Techpowerup - http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_750_Ti/25.html), I get roughly 40% higher performance for GM200 over GK110.

I know scaling is not always linear but in the case of GK107 to GK110 it seems to be extremely linear. From the same review, GK110's performance is 4.66 times that of GK107..and we know that the die size is 4.67 times that of GK107. Very simplistic analysis I know..but gives us a ballpark.

I estimated more in the 50% region but in the grander scheme of things it doesn't make much difference any more. I guess as long as they can double their DP GFLOP/W goal for HPC it won't be a problem. However if Intel meets its own 14-15 GFLOPs/W projections for Knights Landing with the first hw revision it might get tricky.

Dunno what their plans are but I'd dedicate GM200 just to HPC, Quadros and thousand buck Titan successors and serve the enthusiast part of the market with GM204 mGPU. True and absolute GK110 successor = Pascal top dog.

Should be I suppose. Maxwell is also more bandwidth efficient. However, 512-bit would allow them to put more memory on the cards..which matters in the professional/HPC segment. AMD's Firepro W9100 with 16 GB comes to mind.

There's nothing to stop them if they want to have 4/8 GB framebuffers with a 384bit bus in theory.
 
If GM2xx isn't 20nm then this would be the first time TSMC achieved >20% revenue from a process without either NVIDIA or AMD. It's definitely possible but the implications about TSMC's other customers are massive.
 
Why on 16FF+? Should be out on 16FF in 2H 2015 I hope.
Because I just think by say around 2016 16FF+ will be avaliable to Nvidia easily and I dobut Nvidia would want to push higher end Pascals after a year or less Maxwell has been out (though I say that the higher end Maxwells will last shorter than the higher end Keplers).

Though does anyone have a clue why Nvidia has called these higher end Maxwells "Second Generation"? It implies there is going to be more architectural improvements.

If GM2xx isn't 20nm then this would be the first time TSMC achieved >20% revenue from a process without either NVIDIA or AMD. It's definitely possible but the implications about TSMC's other customers are massive.
Well with one of the customers being Apple it certinally helps achieving >20% revenue from a process without either NVIDIA or AMD.
 
If GM2xx isn't 20nm then this would be the first time TSMC achieved >20% revenue from a process without either NVIDIA or AMD. It's definitely possible but the implications about TSMC's other customers are massive.

Interesting observation.
 
I'm guessing the memory bit bus width will still be 384-bit wide on the GM200 since it is on 28nm.
This being primarily a HPC part, I guess it'll depend on how long they expect it to lead their portfolio. Data sets in that space are constantly growing exponentially and memory size is always one order of magnitude short.
 
Because I just think by say around 2016 16FF+ will be avaliable to Nvidia easily and I dobut Nvidia would want to push higher end Pascals after a year or less Maxwell has been out (though I say that the higher end Maxwells will last shorter than the higher end Keplers).

Though does anyone have a clue why Nvidia has called these higher end Maxwells "Second Generation"? It implies there is going to be more architectural improvements.


Well with one of the customers being Apple it certinally helps achieving >20% revenue from a process without either NVIDIA or AMD.

Its not just about Nvidia.

Intel plan to release the knight landing next year, with 3D stack RAM and ~3Tflops.

By then if Nvidia cannot deliever anything competitive they will lose a lot of market share in HPC market.

I think the sudden appearance of Pascal in NV's plan implies it is the counter for Intel's next geneartion accerlaters, so despite of when high-end maxwell willl be out, I think Pascal will be out in 2015.
 
I estimated more in the 50% region but in the grander scheme of things it doesn't make much difference any more. I guess as long as they can double their DP GFLOP/W goal for HPC it won't be a problem. However if Intel meets its own 14-15 GFLOPs/W projections for Knights Landing with the first hw revision it might get tricky.

Dunno what their plans are but I'd dedicate GM200 just to HPC, Quadros and thousand buck Titan successors and serve the enthusiast part of the market with GM204 mGPU. True and absolute GK110 successor = Pascal top dog.

Given that they're staying on the same process..anything in that range is impressive enough IMHO. You think they'd move to 1/2 DP rate like AMD has?

I cant see GM204 completely replacing GK110 for gaming..not unless they increased the die size substantially.
There's nothing to stop them if they want to have 4/8 GB framebuffers with a 384bit bus in theory.
Of course..but if they want to hit 16 GB like the AMD part I mentioned..they'll have to go wider right?
If GM2xx isn't 20nm then this would be the first time TSMC achieved >20% revenue from a process without either NVIDIA or AMD. It's definitely possible but the implications about TSMC's other customers are massive.

Arun, when do you think they will hit 20% revenue from 20nm? The article I linked to upthread mentions 60,000 WPM by Q4. Also couple of other points to add:-

a) Nvidia's Erista is on 20nm
b) We haven't heard any news yet regarding AMD's plans for the node though there are rumours they may be shifting some production to GF (At least for 28nm)
c) As mentioned by others already, lots of rumours of Apple buying 20nm capacity
d) Qualcomm's 20nm modems should be ramping soon and SoC's by Q4. They aren't really a small player nowadays
Because I just think by say around 2016 16FF+ will be avaliable to Nvidia easily and I dobut Nvidia would want to push higher end Pascals after a year or less Maxwell has been out (though I say that the higher end Maxwells will last shorter than the higher end Keplers).

If 16FF is mature by 2H15, they will definitely have a part ready by then. Not necessarily high end though. With Kepler and Maxwell they've set a trend of coming out with the lower end parts first..and I expect them to follow that with the 16nm transition.
Though does anyone have a clue why Nvidia has called these higher end Maxwells "Second Generation"? It implies there is going to be more architectural improvements.
I don't believe Nvidia has called them anything yet..
 
If Maxwell ends up being very good at coin mining, will we see a supply constraint like we saw with AMD stuff earlier this year?
 
It would have to be significantly better than AMD's constrained cards, since they've returned to normal pricing.
Additionally, enough people would need to be keeping the GPU mining demand significantly elevated, with multiple data points saying that time has passed.
 
Given that they're staying on the same process..anything in that range is impressive enough IMHO. You think they'd move to 1/2 DP rate like AMD has?

Silly backwards speculative math could lead you to something like that albeit I doubt they'll abandon the dedicated FP64 SPs approach that soon. If you'd go for 25 SMMs in theory with 64 FP64 SPs/SMM and a 0.85GHz frequency you get exactly 12 DP FLOPs/W with a 225W TDP.

I cant see GM204 completely replacing GK110 for gaming..not unless they increased the die size substantially.

Then they've most likely created an equally boring upgrade to the GK104 which would amount for a similar around 40-50% increase in performance.

Of course..but if they want to hit 16 GB like the AMD part I mentioned..they'll have to go wider right?

My point was/is that they can go for even framebuffer GB amounts even with an "uneven" buswidth; they've done it before. Technically at least there's nothing that speaks against it afaik except probably that the driver guys would want to pull their own hair out.

If you can have in theory 8 GB with a 384bit bus, what would then stop you to go to 16GB again in theory? :?:
 
Back
Top