Nvidia Turing Speculation thread [2018]

Status
Not open for further replies.
TSMC accelerating 7nm process production plan to meet demand
June 8, 2018
HiSilicon, MediaTek, Xilinx and Nvidia have all disclosed their adoption of TSMC's 7nm process in their next-generation chip products. Meanwhile, IC design service providers Global Unichip and Alchip have started providing solutions to help their clients speed up 7nm chip development.

TSMC moved its 7nm process dubbed N7 to volume production in the second quarter of 2018. The foundry expects sales generated from the node to account for over 20% in the fourth quarter of 2018 and 10% in all of the year. In addition to mobile devices, server CPU, network processors, gaming, GPU, FPGA, cryptocurrency, automotive and AI are the target applications of the N7.

An improved version of TSMC's N7 technology, dubbed N7 Plus, will adopt EUV lithography technology and become available in 2019, according to the foundry.
https://www.digitimes.com/news/a20180608PD211.html
 
They started volume production early Q2 already, how is this "mass production" different from that?
Takes about 3 months to produce a single wafer, so Mass production follows Volume production and is the point where the process is complete and silicon wafers are available to customers.
Considering how long it takes to go from a blank silicon wafer to a fully packaged and tested processor, and considering the sheer volumes of A12 chips that Apple will need to support its launch, it makes perfect sense that TSMC is starting to produce the chips now, in April.

If we assume production began on April 1, then the first chips should be rolling off TSMC's manufacturing lines roughly three months later, in early July. If Apple's contract manufacturers get the first batches of A12 chips in early July, then it's reasonable to expect fully finished iPhones to start coming out in late July or early August. With enough lead time, Apple should be able to produce millions of these phones to support a launch in September.
https://www.fool.com/investing/2018/04/25/apple-incs-a12-chip-is-likely-in-mass-production-r.aspx
 
September release likely:
In the meantime, we have unofficially learned from some board partners that Nvidia has already started training the relevant employees from the development departments. And since Nvidia does not prioritize any sales training courses for sales, the time calculation based on the tables listed below is likely to take on much more concise forms.If you follow the 3-month rule, the first board partner cards should appear on the market in late August or early September. However, some of the partners are now expecting a shift of at least two weeks, so that September seems rather plausible.

More in link.

https://translate.google.com/translate?sl=de&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=https://www.tomshw.de/2018/06/12/nvidia-geforce-gtx-2080-1180-oder-was-wir-ueber-turing-nicht-wissen-update/&edit-text=&act=url
 
Lines up with the rumors of a Gamescom announcement. I just hope it delivers.

2+ years since Pascal has certainly raised my expectations.
 
I still don't get it. Either they're on 12nm launching in September or so, and the [don't expect it for a long time] thing was a really weird dodge, because what they'd be dodging I don't know.

Or they're on 7nm and launching next year for mass consumer. Or, well ok. If AMD can put out a pro only 7nm chip from TSMC this year then so can Nvidia. Charging mid 5 figures or more offsets the low volume and high chip production costs you'd find making large dies on a new process. Consumer chips of the architecture could come next year. Wouldn't explain the comment, but hey even modern tech CEOs, coached as they are, can make minor flubs.
 
If AMD can put out a pro only 7nm chip from TSMC this year then so can Nvidia.

I don't believe Nvidia will release 7nm chips this year, but as far as that sentence goes, based on recent history, I'd say that "if AMD can put out a pro only 7nm chip from TSMC this year then Nvidia can do better with almost any chip."

What die size would a potential 7nm "Turing" chip need be to slightly beat a 1080 Ti? TSMC claims:

Compared to its 10nm FinFET process, TSMC's 7nm FinFET features 1.6X logic density, ~20% speed improvement, and ~40% power reduction.

and

With a more aggressive geometric shrinkage, this process offers 2X logic density than its 16nm predecessor, along with ~15% faster speed and ~35% less power consumption.
EDIT: ^^This is about the 10nm process

I mean, I don't know how accurate those claims really are, but the gap is immense with 16FF+. Even being conservative it looks like a 180mm^2 would suffice. taking those claims at face value, it would be less than 140mm^2!!

And as I said above, I'd say that if AMD can put out a 300+ mm^2 in enough quantities for any market, Nvidia could certainly handle a 170-200 mm^2 chip in enough quantities for a GTX 1180 (paper) release, if that's what they really wanted to do.
 
I don't believe Nvidia will release 7nm chips this year, but as far as that sentence goes, based on recent history, I'd say that "if AMD can put out a pro only 7nm chip from TSMC this year then Nvidia can do better with almost any chip."

What die size would a potential 7nm "Turing" chip need be to slightly beat a 1080 Ti? TSMC claims:



and


EDIT: ^^This is about the 10nm process

I mean, I don't know how accurate those claims really are, but the gap is immense with 16FF+. Even being conservative it looks like a 180mm^2 would suffice. taking those claims at face value, it would be less than 140mm^2!!

And as I said above, I'd say that if AMD can put out a 300+ mm^2 in enough quantities for any market, Nvidia could certainly handle a 170-200 mm^2 chip in enough quantities for a GTX 1180 (paper) release, if that's what they really wanted to do.
EETimes did an article back in May regarding the 7nm multi-patterning and 7+ EUV process from TSMC compared to their 16FF+ (rather than the "12nm" mature 16FF process).
https://www.eetimes.com/document.asp?doc_id=1333244
EETimes on TSMC 7nm said:
The node delivers 35% more speed or uses 65% less power and sports a 3x gain in routed gate density compared to the 16FF+ generation two steps before it. By contrast, the N7+ node with EUV will only deliver 20% more density, 10% less power, and apparently no speed gains — and those advances require use of new standard cells.
The non-EUV DUV 7nm multi-patterning process cost is higher relative to the TSMC 7+ EUV, so would influence its use.

For contrast Global Foundries mention for 7LP 40% more speed or uses 60% less power, density a bit less than TSMC; GF product brief spec goes back to late 2017 but still dated 2018 so would seem accurate.
https://www.globalfoundries.com/sites/default/files/product-briefs/pb-7lp.pdf
 
Last edited:
Re: the leaks about a "new boost clock algorithm" for Turing:

It seems to me as though Nvidia's boost clock algorithm is already quite mature, and as such there is minimal clock headroom left for overclocking beyond what the boost algorithm can already achieve. If Nvidia wants to "pull an AMD" and push Turing toward the outer edge of the thermal/power envelope they will only gain perhaps 5% over what is already achievable. That leaves process improvements to make up the rest of the expected performance gains over Pascal, a notion I find dubious given the nature of the "12nm" (i.e. 16nm+) process involved.

All this makes me wonder, perhaps even hope (naively) if perhaps the leaks as they have been interpreted thus far don't quite tell the whole story. If Turing is to gain any sizable amount of performance over Pascal on the average (gaming workloads), given the same number of ALUs and only minor architectural changes, there will need to be some major clock speeds improvements to achieve this. Given what I've already mentioned about the likelihood of this prospect, to me this only leaves one possible solution:

The return of clock domains.

Discuss.
 
All this makes me wonder, perhaps even hope (naively) if perhaps the leaks as they have been interpreted thus far don't quite tell the whole story. If Turing is to gain any sizable amount of performance over Pascal on the average (gaming workloads), given the same number of ALUs and only minor architectural changes, there will need to be some major clock speeds improvements to achieve this. Given what I've already mentioned about the likelihood of this prospect, to me this only leaves one possible solution:

The return of clock domains.

Discuss.

Where did you see that they will maintain the same number of ALUs? I doubt we will see the return of double pumped ALUs, given NVidia's focus on efficiency since Kepler. Fermi was the end of the road for NVidia's Netburst.
 
Every rumor/leak that we have seen up until this point has indicated "GT104" will contain 3584 ALUs - i.e. the same amount of enabled ALUs found in *most* GP102 SKUs. It is upon this assumption that I base my premise. If this turns out to be incorrect, then my premise will of course be untrue.

This is not to say that there won't be more ALUs in larger SKUs e.g. "GT102/100", but that could be a year or more away.
 
Every rumor/leak that we have seen up until this point has indicated "GT104" will contain 3584 ALUs - i.e. the same amount of enabled ALUs found in *most* GP102 SKUs.

This is not to say that there won't be more ALUs in larger SKUs e.g. "GT102/100", but that could be a year or more away.

Oh I see what you mean. Well, GP102 clocks around 1.5Ghz, some 200 to 300 Mhz lower than GP104. With the 12nm shrink they probably can clock even a bit higher (2Ghz?) and together with some architecture changes, plus GDDR6, that might be enough to have a 30% increase over GP102? I really do not think we will see double pumped ALUs.
 
Oh I see what you mean. Well, GP102 clocks around 1.5Ghz, some 200 to 300 Mhz lower than GP104. With the 12nm shrink they probably can clock even a bit higher (2Ghz?) and together with some architecture changes, plus GDDR6, that might be enough to have a 30% increase over GP102? I really do not think we will see double pumped ALUs.

Specified "OEM" boost clocks do not match reality, in my experience. I've owned both GP104 and GP102-based graphics cards and they both hit the same boost clocks ~1900MHz without overclocking. Now, that is with power maxed out and under waterblocks but there was no manual overclocking involved. When overclocking is involved, the cap seems to be around 2.1GHz for Pascal GPUs regardless of SKU.
 
Every rumor/leak that we have seen up until this point has indicated "GT104" will contain 3584 ALUs - i.e. the same amount of enabled ALUs found in *most* GP102 SKUs. It is upon this assumption that I base my premise. If this turns out to be incorrect, then my premise will of course be untrue.

This is not to say that there won't be more ALUs in larger SKUs e.g. "GT102/100", but that could be a year or more away.

This is to be expected.

  • Recently, the 100/102-type parts have been roughly 1.5x of the 104-type parts (ALUs, memory bus, etc).
  • The 100-type parts have had as many ALUs as the 102-type parts (at least for Pascal).
And then we can apply that knowledge:

  • GV100 has 40% more ALUs than GP100/GP102.
  • If the 1.5x ratio holds, then we'd expect "GT104" to have 40% more ALUs than GP104.
Sure enough, a 40% bump to GP104's 2560 ALUs is exactly 3584 ALUs.

So I think it's very reasonable to expect 3584 ALUs in "GT104".
 
This is to be expected.

  • Recently, the 100/102-type parts have been roughly 1.5x of the 104-type parts (ALUs, memory bus, etc).
  • The 100-type parts have had as many ALUs as the 102-type parts (at least for Pascal).
And then we can apply that knowledge:

  • GV100 has 40% more ALUs than GP100/GP102.
  • If the 1.5x ratio holds, then we'd expect "GT104" to have 40% more ALUs than GP104.
Sure enough, a 40% bump to GP104's 2560 ALUs is exactly 3584 ALUs.

So I think it's very reasonable to expect 3584 ALUs in "GT104".

I would be shocked if anything other than this were to occur when GT104-based SKUs launch in the next 1-3 months.

Just for the sake of clarification, my theory pertains to the method by which GT104 may surpass GP102 in raw arithmetic performance, which would require something more exotic than minor process and boost algorithm optimizations would allow, in my opinion.
 
Status
Not open for further replies.
Back
Top