It does feel a bit like 16nm vs 20nm… In many ways, A16 feels like N2P+backside and not much else, just like 16nm was mostly just FinFET. It’s quite disappointing N2P lost backside power delivery, that must be a very welcome surprise for Intel. I wonder how A16 compares to the *original* N2P with backside power delivery…So is TSMC's N1.6 kind of just like N2+? The gains claimed seem far more like an evolved node(ala N4 vs N5 rather than a full new generation.
It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.So is TSMC's N1.6 kind of just like N2+?
Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.The gains claimed seem far more like an evolved node(ala N4 vs N5 rather than a full new generation.
It's still an N2 derivative.It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.
Well no, TSM just fucked up their execution for the first time in eons (well, there's N3b but it's a workable node even if denser SRAM cells are MIA) and/or had to shitcan the initial simple as bricks BSPDN in response to i18a/i16a or w/ever the cost derivative of i18a is called.Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.
But N2 was supposed to get backside power delivery already, as you point out. Basically, it seems like N1.6 is not really a 'full effort' node, and more just what they originally wanted N2 to be in the first place, hence the smaller gains. Especially given that this is supposed to come just one year after N2's arrival.It's officially "A16" now and it gets back side power delivery which was planned for N2 previously so not really N2+.
I seriously doubt that the really small gains here are indicative of what all 'full node jumps' will look like going forward. Though of course, the more iterative jumps they do, the smaller the gaps between each one necessarily become, but that's kind of why I'm thinking if we compared N2 to whatever the next real ground-up node process will be, the jump will be a decent bit bigger than from N2 to N1.6.Welcome to the end of silicon scaling. SRAM doesn't really scale below N5 and now it's logic's turn.
All these "angstrom" processes will probably be lackluster in terms of physical scaling. Still they will bring power and costs improvements.
For now, yes. But this is also why many expected chiplet/tile designs to take over soon enough, out of necessity. I mean, from what I understand, High NA is straight up mandatory for proper area scaling beyond a certain point. Nvidia is already playing with 'gluing' GPU's together, so it's not like they're gonna get caught out here.TSMC can't really adopt High-NA EUV lithography scanner systems because some of their customers rely on bigger dies/higher reticle limits for their HW designs ... (especially Nvidia)
no, they're not betting on hNA for other reasons.TSMC can't really adopt High-NA EUV lithography scanner systems because some of their customers rely on bigger dies/higher reticle limits for their HW designs ... (especially Nvidia)
They're gluing together two >800mm^2 dies and a lot of players don't want to bet on implementing highly complex interconnects as the future since it remains to be seen how well they'll scale in performance for multiple dies ...For now, yes. But this is also why many expected chiplet/tile designs to take over soon enough, out of necessity. I mean, from what I understand, High NA is straight up mandatory for proper area scaling beyond a certain point. Nvidia is already playing with 'gluing' GPU's together, so it's not like they're gonna get caught out here.
I don't. We already see this between N5 and N3 right now, it will only get worse from here.I seriously doubt that the really small gains here are indicative of what all 'full node jumps' will look like going forward.
Yep, IO scaling hit a wall first and now SRAM scaling hit a wall at N3. Logic still scales but that's it.I don't. We already see this between N5 and N3 right now, it will only get worse from here.
Obviously gains are decreasing, but not to such an extreme extent as to what N2 to N1.6 is showing. 1.1x logic density increase? You really think that's literally the max jump that anybody will be capable of going forward? Come on now.I don't. We already see this between N5 and N3 right now, it will only get worse from here.
Such interconnects are not going to be so exotic in the future and can even be largely offered by the foundry rather than anything the chip designers need to completely develop themselves. The 'scaling' and everything are not insurmountable issues whatsoever, either. And the companies too destitute or small to want to invest in more complex packaging solutions aren't the types who need 500mm²+ chips in the first place.They're gluing together two >800mm^2 dies and a lot of players don't want to bet on implementing highly complex interconnects as the future since it remains to be seen how well they'll scale in performance for multiple dies ...
Can you imagine how scandalous it would be in the graphics programming world that the only best way to scale for higher graphics performance on PC is to do explicit multi-GPU programming in comparison to the simplicity of programming on a consoles monolithic physical design ?
praise be the allmighty CFET, our last bastion of the (future) SRAM scaling. amen.Yep, IO scaling hit a wall first and now SRAM scaling hit a wall at N3. Logic still scales but that's it.
Yeah A16 is basically a nodelet.but not to such an extreme extent as to what N2 to N1.6 is showing. 1.1x logic density increase?
Either way you slice it (chiplets or no), having bigger monolithic physical designs available is still an advantage in terms of simplicity, performance, scaling, and programming perspective which is potentially why some manufacturers are keen to retain these advantages. You can go farther if you combine bigger designs w/ chiplets than just doing chiplets alone ...Such interconnects are not going to be so exotic in the future and can even be largely offered by the foundry rather than anything the chip designers need to completely develop themselves. The 'scaling' and everything are not insurmountable issues whatsoever, either. And the companies too destitute or small to want to invest in more complex packaging solutions aren't the types who need 500mm²+ chips in the first place.
I have no idea why you're talking about multi-GPU programming, either. That is not what is happening here.
No, it is not a 'programming' advantage. Again, I dont know where you're coming up with that.Either way you slice it (chiplets or no), having bigger monolithic physical designs available is still an advantage in terms of simplicity, performance, scaling, and programming perspective which is potentially why some manufacturers are keen to retain these advantages. You can go farther if you combine bigger designs w/ chiplets than just doing chiplets alone ...
That's only been proven in practice for compute so far ...Large scaling of many GPU's for single workloads isn't even new. You seem to have a very outdated and very gamer-centric notion of how this works.
Large monolithic designs have very real advantages in real-time rendering as seen time and time again with Nvidia consistently winning the performance crown with their big dies and you don't have to worry about interconnect bottlenecks either from a programming standpoint hence the advantage in it's simplicity so there's other potential benefits elsewhere in other workloads ...You're also gonna be wrong about the idea that monolithic has some significant advantages. Both AMD and Nvidia are doing tiled/chiplet designs for their most high end products. And we're only getting started.
Always provide some context with a link, please
For posterity's sake, I would like to commemorate this as the first of many times I personally get confused between the TSMC A16 process and the Apple A16 SoC... (doesn't help I worked on one of them obviously)Yeah A16 is basically a nodelet.
Apple M2 Ultra would like to say hello... and if you think you can afford more than 2x ~400mm2 dies that would be reticle limited on a hypothetical High-NA future process with significantly higher wafer prices than TSMC N4, I have good news for you: you can't Unless you think the RTX 4090 is cheap, that is.That's only been proven in practice for compute so far ...
From what little information I've seen out there, the performance scaling in games is mediocre. Results can be worse with their game porting toolkit. I've also seen a number of games that were sensitive to L2 cache performance and given that D3D12 Work Graphs may seemingly the future of GPU-driven rendering, how exactly do you think performance will scale for multiple compute dies given the propensity for the API to potentially improve data locality and cache reuse ?Apple M2 Ultra would like to say hello... and if you think you can afford more than 2x ~400mm2 dies that would be reticle limited on a hypothetical High-NA future process with significantly higher wafer prices than TSMC N4, I have good news for you: you can't Unless you think the RTX 4090 is cheap, that is.
HPC Ampere and Hopper are not "monolith" dies. They have a seperated L2 cache connected with an interconnect. nVidia can connect two 400mm^2 dies with their 5TB/s/dir interconnect and full L2 cache speed.That's only been proven in practice for compute so far ...
Large monolithic designs have very real advantages in real-time rendering as seen time and time again with Nvidia consistently winning the performance crown with their big dies and you don't have to worry about interconnect bottlenecks either from a programming standpoint hence the advantage in it's simplicity so there's other potential benefits elsewhere in other workloads ...