NVIDIA Maxwell Speculation Thread

:oops: Wow, we are already in the speculation zone for what will come in 2015. :LOL:

Given that we know almost nothing about what's coming in a quarter... that's a nice try. ;)
 
Apparently not all of them will.

And this report is under "Analysis."

If this report is true, then I'm guessing the Maxwell lineup may be something like the G80, G92, and GT200.

"Maxwell Prime": 28 nm, Q1 2014, 400-450 mm^2, 384-bit bus, game performance ≥ GeForce version of Tesla K20/K20X.
"Maxwell Lite": 20 nm, Q4 2014 or whenever 20 nm realistically shows up, 250-300 mm^2, ≥256-bit bus, performance ≈ Maxwell Prime depending on bandwidth bottlenecks?
"Big Maxwell": 20 nm, 2015, 500-550 mm^2, 512-bit bus, focusing on compute.

I don't know what the V I from S|A is smoking but TSMC will have 20nm in full production in 2013 so no way is Maxwell going to be either on 28nm or released in 2015.

TSMC Lays Process Technology Roadmap Out.
http://www.xbitlabs.com/news/other/...SMC_Laids_Process_Technology_Roadmap_Out.html

TSMC says it's ready for 20-nm designs
http://www.eetimes.com/design/eda-design/4398190/20-nm-open-for-design-says-TSMC

20nm Technology
http://www.tsmc.com/english/dedicatedFoundry/technology/20nm.htm

TSMC sketches finFET roadmap
http://www.techdesignforums.com/blog/2012/10/16/tsmc-finfet-roadmap/
 
Last edited by a moderator:
I think performance at 20/22 nm without FDSOI or FinFETs will be a dud (ie. not competitive with just sticking to 28 nm). TSMC seems to be the only one trying it.
 
Not to parse PR text too finely, but the statement about TSMC's process is an OR list, not an AND.

It's 30% faster, and 1.9x as dense, and 25% less power.
It's 30% faster, or 1.9x as dense, or 25% less power.

For GPU chips that tend to push things on performance, size, and power, the former list sounds more promising than the latter.
 
SRAM can use larger cells than the process minimum for performance or reliability reasons.
Logic has tended to scale less than SRAM with each node, and more complex or higher speed logic has shrunk more slowly.
Less regular structures may need extra measures to manufacture, and wires in more complex logic can restrict density improvement.
When increasing for drive strength for FinFETs, extra fins and thus extra area can used to provide more current.

For control of leakage or manufacturability, parts of the chip can use physically larger transistors to reduce leakage and resist variation. Power gates are physically large relative to the rest.
Even with ideal scaling, the leakage and power efficiency issues will require allocating more of the larger transistor budget to more aggressive power control measures.

The smaller gates are physically less able to control leakage. Without a materials change or fundamental change in the structure of the gates, say HKMG or FinFETs that Intel regularly pushes out, shrinking today's tech by a node means headaches.
Intel has been pretty good about getting these difficult transitions done before the problems they solve smack it in the face.
The foundries tend to delay these more fundamental changes by a node, so they get smacked in the face every other node.

One thing I'm curious about with the foundry plans to hop to FinFETs with a 14nm transistor but 20nm metal layer is how this might compare to Intel's historically being somewhat less aggressive in scaling wire density.
 

The design isn't even done yet and that site is so self-assured to create even a picture with a supposed SKU name. As a sidenote it is true afaik that NVIDIA contracted TSMC for their 20nm. I just have the weird gut feeling that the supposed "news" that has been plagiarized from one website to the other with an origin of a korean site isn't really mentioned per se in the original source.
 
They've had names and "specs" for ten 700 series cards on their site for at least half a year now. So I'm not surprised. :LOL:

Why do I have the feeling that this upcoming generation was ready loooong time ago but NV and AMD artificially with other intentions delay the launch till Q1-Q2 2013... What is so special about this generation that they need so much time to cook it?

And from this line of thoughts, I wouldn't be surprised that some people have known the specs for quite a while.
 
Why do I have the feeling that this upcoming generation was ready loooong time ago but NV and AMD artificially with other intentions delay the launch till Q1-Q2 2013... What is so special about this generation that they need so much time to cook it?
4 or 5 quarters between updates is artificially long?
 
Given that the next generation is on the same process, it's a distinct possibility.
I think AMD and Nvidia are very close to each other this generation, with no major obvious flaws. New silicon on the same process can only give incremental changes, IMO. I wonder if it's not better waiting for 20nm for the next big push? Maybe just do some clock speed bump and call it a day.
 
Why do I have the feeling that this upcoming generation was ready loooong time ago but NV and AMD artificially with other intentions delay the launch till Q1-Q2 2013... What is so special about this generation that they need so much time to cook it?

And from this line of thoughts, I wouldn't be surprised that some people have known the specs for quite a while.

Nvidia probably wants to lead the 700 series with GK110, but they are probably allocating all their current production until special orders and the HPC supply chains are adequately saturated. But also you are probably partially right, there probably isn't a pressing need on Nvidia's end to refresh their current GPU's, given their higher $$$/mm^2 than what they were getting off Fermi dies.
 
I think AMD and Nvidia are very close to each other this generation, with no major obvious flaws. New silicon on the same process can only give incremental changes, IMO. I wonder if it's not better waiting for 20nm for the next big push? Maybe just do some clock speed bump and call it a day.

Possible guess predicated on Maxwell turning out to be the chip that puts an on-die CPU:

Wanting to get the hybrid architecture out in time for an HPC deal?
Trying to move to a self-hosting or mostly self-hosting Teslas?
I would figure that Nvidia has a decent idea on how their design would work, but it might help to put the design through its paces sooner rather than later.
There is doubt about the timeliness of 20nm.

This wouldn't require that the consumer boards get Maxwell.
 
I will believe if a co processor ARM cpu is used in the gpu, the benefit could only come by the software and instructions used. For me this use look to be a really short term solution when i look the ambition of other actors about it.
 
Last edited by a moderator:
If Maxwell has a timeline that is not served by the possible timing of 20nm, an initial wave on 28nm could a way to reduce risk of further schedul slips.
Having ARM cores implemented could possibly remove one of the notable shortcomings Tesla has against Larrabee, which has CPU cores that are at least capable of running CPU-side code on the board.
Possibly, at some point accellerator board in some eventual future product would just be the motherboard, and Nvidia can remove a competitor's silicon from being a required element to all of its computing products.
CUDA software running on top the driver layer shouldn't care what CPU is running the driver.

It's a possible direction Nvidia could be taking, at least.
If the host was moved on-die, the Tesla board would--with some work akin to a sever version of its Tegra SOC design--look a bit like an ARM shared-nothing dense server board.
What doesn't seem to work yet is a solution for a cache-coherent interconnect, and the RAM pool a GPU would have is too tiny without the much larger capacity DIMM pool the host x86 chip currently provides.
A memory standard like HMC might solve the capacity without sacrificing the bandwidth Tesla currently relies on.
The other thing is the need for an interconnect, since that is something Nvidia has thus far been reliant on the platform hosted by that inconvenient neighbor it has in the compute node.
 
If Maxwell has a timeline that is not served by the possible timing of 20nm, an initial wave on 28nm could a way to reduce risk of further schedul slips.
Having ARM cores implemented could possibly remove one of the notable shortcomings Tesla has against Larrabee, which has CPU cores that are at least capable of running CPU-side code on the board.
Possibly, at some point accellerator board in some eventual future product would just be the motherboard, and Nvidia can remove a competitor's silicon from being a required element to all of its computing products.
CUDA software running on top the driver layer shouldn't care what CPU is running the driver.

It's a possible direction Nvidia could be taking, at least.
If the host was moved on-die, the Tesla board would--with some work akin to a sever version of its Tegra SOC design--look a bit like an ARM shared-nothing dense server board.
What doesn't seem to work yet is a solution for a cache-coherent interconnect, and the RAM pool a GPU would have is too tiny without the much larger capacity DIMM pool the host x86 chip currently provides.
A memory standard like HMC might solve the capacity without sacrificing the bandwidth Tesla currently relies on.
The other thing is the need for an interconnect, since that is something Nvidia has thus far been reliant on the platform hosted by that inconvenient neighbor it has in the compute node.

With the instructions included in ARM right now, a complete rewrite of it will be needed,... you will not been able to use it as primary " self processor" without loose a lot of efficiency vs use an x86 processors for do the work. So the impact is really limited on what it will cost you in term of developpement, resource used on a gpu. Ofc Nvidia have the possibility to rewrite the software and library for use it on certain purpose, but, is it really efficient outside marketing ? The main problem of CUDA is the efficiency, most of the usage time you win by use it in computing is lost then, by the way you need hardware for re-encode and been able to use the language and library, and then decode and re encode again the result for control them... ( as CUDA is reallly not safe in term of fiability ( specially when you need a lot of precision on the result.. you neeed basically many hardware and cpu usage time for check the results ), And im affraid this will become worst with an ARM processor for it . ( basically you cant use the base of software/instructions/code you use normally, it will be needed to re encode them for it, it push even further the problem of CUDA API/Library by even need to force an other manipulation of the data ). With this direction the approach of AMD and HSA, look more .. safe ( maybe not the word i wanted to use or not the right word, maybe i should say more toughtful ) .
 
Last edited by a moderator:
Back
Top