TSMC Execution [2024]

TSMC can't really adopt High-NA EUV lithography scanner systems because some of their customers rely on bigger dies/higher reticle limits for their HW designs ... (especially Nvidia)
 
So is TSMC's N1.6 kind of just like N2+? The gains claimed seem far more like an evolved node(ala N4 vs N5 rather than a full new generation.
It does feel a bit like 16nm vs 20nm… In many ways, A16 feels like N2P+backside and not much else, just like 16nm was mostly just FinFET. It’s quite disappointing N2P lost backside power delivery, that must be a very welcome surprise for Intel. I wonder how A16 compares to the *original* N2P with backside power delivery…

The density scaling is sufficiently bad that it seems to vindicate Intel’s High-NA strategy to some extent as necessary for density, although TSMC’s argument will be that doesn’t necessarily make the transistors/$ any better or significantly improve perf/power.

Still, I expect TSMC to be extremely competitive with Intel in practice for N2(non-P) vs 18A and beyond that, process nodes aren’t just about high level features and buzzwords, and TSMC has consistently delivered where it matters.
 
So is TSMC's N1.6 kind of just like N2+?
It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.


The gains claimed seem far more like an evolved node(ala N4 vs N5 rather than a full new generation.
Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.
All these "angstrom" processes will probably be lackluster in terms of physical scaling. Still they will bring power and costs improvements.
 
This post’s accuracy is not substantiated. Exercise discretion and wait verified sources before accepting its claims.
It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.
It's still an N2 derivative.
Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.
Well no, TSM just fucked up their execution for the first time in eons (well, there's N3b but it's a workable node even if denser SRAM cells are MIA) and/or had to shitcan the initial simple as bricks BSPDN in response to i18a/i16a or w/ever the cost derivative of i18a is called.
 
It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.
But N2 was supposed to get backside power delivery already, as you point out. Basically, it seems like N1.6 is not really a 'full effort' node, and more just what they originally wanted N2 to be in the first place, hence the smaller gains. Especially given that this is supposed to come just one year after N2's arrival.

Basically, analogous to like like N7+, which was EUV and perhaps not design compatible with N7, but still based around N7 and not a full node jump.

Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.
All these "angstrom" processes will probably be lackluster in terms of physical scaling. Still they will bring power and costs improvements.
I seriously doubt that the really small gains here are indicative of what all 'full node jumps' will look like going forward. Though of course, the more iterative jumps they do, the smaller the gaps between each one necessarily become, but that's kind of why I'm thinking if we compared N2 to whatever the next real ground-up node process will be, the jump will be a decent bit bigger than from N2 to N1.6.
 
Last edited:
TSMC can't really adopt High-NA EUV lithography scanner systems because some of their customers rely on bigger dies/higher reticle limits for their HW designs ... (especially Nvidia)
For now, yes. But this is also why many expected chiplet/tile designs to take over soon enough, out of necessity. I mean, from what I understand, High NA is straight up mandatory for proper area scaling beyond a certain point. Nvidia is already playing with 'gluing' GPU's together, so it's not like they're gonna get caught out here.
 
This post’s accuracy is not substantiated. Exercise discretion and wait verified sources before accepting its claims.
TSMC can't really adopt High-NA EUV lithography scanner systems because some of their customers rely on bigger dies/higher reticle limits for their HW designs ... (especially Nvidia)
no, they're not betting on hNA for other reasons.
for now, anyway.
 
For now, yes. But this is also why many expected chiplet/tile designs to take over soon enough, out of necessity. I mean, from what I understand, High NA is straight up mandatory for proper area scaling beyond a certain point. Nvidia is already playing with 'gluing' GPU's together, so it's not like they're gonna get caught out here.
They're gluing together two >800mm^2 dies and a lot of players don't want to bet on implementing highly complex interconnects as the future since it remains to be seen how well they'll scale in performance for multiple dies ...

Can you imagine how scandalous it would be in the graphics programming world that the only best way to scale for higher graphics performance on PC is to do explicit multi-GPU programming in comparison to the simplicity of programming on a consoles monolithic physical design ?
 
I don't. We already see this between N5 and N3 right now, it will only get worse from here.
Obviously gains are decreasing, but not to such an extreme extent as to what N2 to N1.6 is showing. 1.1x logic density increase? You really think that's literally the max jump that anybody will be capable of going forward? Come on now.

They're gluing together two >800mm^2 dies and a lot of players don't want to bet on implementing highly complex interconnects as the future since it remains to be seen how well they'll scale in performance for multiple dies ...

Can you imagine how scandalous it would be in the graphics programming world that the only best way to scale for higher graphics performance on PC is to do explicit multi-GPU programming in comparison to the simplicity of programming on a consoles monolithic physical design ?
Such interconnects are not going to be so exotic in the future and can even be largely offered by the foundry rather than anything the chip designers need to completely develop themselves. The 'scaling' and everything are not insurmountable issues whatsoever, either. And the companies too destitute or small to want to invest in more complex packaging solutions aren't the types who need 500mm²+ chips in the first place.

I have no idea why you're talking about multi-GPU programming, either. That is not what is happening here.
 
Yep, IO scaling hit a wall first and now SRAM scaling hit a wall at N3. Logic still scales but that's it.
praise be the allmighty CFET, our last bastion of the (future) SRAM scaling. amen.
but not to such an extreme extent as to what N2 to N1.6 is showing. 1.1x logic density increase?
Yeah A16 is basically a nodelet.
Not the first time TSM did a bolt on feature for existing BEOL.
 
Such interconnects are not going to be so exotic in the future and can even be largely offered by the foundry rather than anything the chip designers need to completely develop themselves. The 'scaling' and everything are not insurmountable issues whatsoever, either. And the companies too destitute or small to want to invest in more complex packaging solutions aren't the types who need 500mm²+ chips in the first place.

I have no idea why you're talking about multi-GPU programming, either. That is not what is happening here.
Either way you slice it (chiplets or no), having bigger monolithic physical designs available is still an advantage in terms of simplicity, performance, scaling, and programming perspective which is potentially why some manufacturers are keen to retain these advantages. You can go farther if you combine bigger designs w/ chiplets than just doing chiplets alone ...
 
Either way you slice it (chiplets or no), having bigger monolithic physical designs available is still an advantage in terms of simplicity, performance, scaling, and programming perspective which is potentially why some manufacturers are keen to retain these advantages. You can go farther if you combine bigger designs w/ chiplets than just doing chiplets alone ...
No, it is not a 'programming' advantage. Again, I dont know where you're coming up with that.

Large scaling of many GPU's for single workloads isn't even new. You seem to have a very outdated and very gamer-centric notion of how this works.

You're also gonna be wrong about the idea that monolithic has some significant advantages. Both AMD and Nvidia are doing tiled/chiplet designs for their most high end products. And we're only getting started.
 
Large scaling of many GPU's for single workloads isn't even new. You seem to have a very outdated and very gamer-centric notion of how this works.
That's only been proven in practice for compute so far ...
You're also gonna be wrong about the idea that monolithic has some significant advantages. Both AMD and Nvidia are doing tiled/chiplet designs for their most high end products. And we're only getting started.
Large monolithic designs have very real advantages in real-time rendering as seen time and time again with Nvidia consistently winning the performance crown with their big dies and you don't have to worry about interconnect bottlenecks either from a programming standpoint hence the advantage in it's simplicity so there's other potential benefits elsewhere in other workloads ...
 
Yeah A16 is basically a nodelet.
For posterity's sake, I would like to commemorate this as the first of many times I personally get confused between the TSMC A16 process and the Apple A16 SoC...🎉:( (doesn't help I worked on one of them obviously)

Anyway,
That's only been proven in practice for compute so far ...
Apple M2 Ultra would like to say hello... and if you think you can afford more than 2x ~400mm2 dies that would be reticle limited on a hypothetical High-NA future process with significantly higher wafer prices than TSMC N4, I have good news for you: you can't ;) Unless you think the RTX 4090 is cheap, that is.
 
Apple M2 Ultra would like to say hello... and if you think you can afford more than 2x ~400mm2 dies that would be reticle limited on a hypothetical High-NA future process with significantly higher wafer prices than TSMC N4, I have good news for you: you can't ;) Unless you think the RTX 4090 is cheap, that is.
From what little information I've seen out there, the performance scaling in games is mediocre. Results can be worse with their game porting toolkit. I've also seen a number of games that were sensitive to L2 cache performance and given that D3D12 Work Graphs may seemingly the future of GPU-driven rendering, how exactly do you think performance will scale for multiple compute dies given the propensity for the API to potentially improve data locality and cache reuse ?
 
That's only been proven in practice for compute so far ...

Large monolithic designs have very real advantages in real-time rendering as seen time and time again with Nvidia consistently winning the performance crown with their big dies and you don't have to worry about interconnect bottlenecks either from a programming standpoint hence the advantage in it's simplicity so there's other potential benefits elsewhere in other workloads ...
HPC Ampere and Hopper are not "monolith" dies. They have a seperated L2 cache connected with an interconnect. nVidia can connect two 400mm^2 dies with their 5TB/s/dir interconnect and full L2 cache speed.
 
Back
Top