This is expensive low throughput packaging that's not suitable for mainstream parts.
Agreed, I can't see 3D packaging being cost effective and high enough volume for an AD104-level product in 2024 so not for Blackwell, but eventually it should come down in price.
It's up.
It's actually up.
N3e wins you on power/perf but it costs more per xtor which is stinky.
It's a real issue and N2 is even worse wrt that (to the point of majorly influencing packaging decisions for Venice -dense).
No, you're still focusing on price per transistor, but I claimed something more subtle & important: "logic transistor times iso-power performance". The "logic" excludes I/O and SRAM transistors which are not scaling. And if you take NVIDIA SMs and increase their clock frequency by 15% at iso-power, you only need 100 SMs to achieve the same performance as 115 SMs (excluding latency tolerance).
N3E logic area is supposedly 0.625x of N5 and iso-power perf is +18% so "(1/0.625)*1.18 = 1.89" = +89% *logic-only* performance for the same area (old numbers from TSMC, I think it might actually be slightly better now for some FinFlex variants, I'm not entirely sure). Even if you believe N3E really is 20K/wafer and 25% more expensive than N5, that's still 42% higher perf/$ (assuming comparable yields). Unfortunately, that's only logic transistors, so without 2.5D/3D packaging to get rid of all I/O and much of the SRAM, the overall perf/$ isn't good.
NV has no experience or any real pathfinding into doing very fancy 3D stuff, it's all Intel and AMD (all things MCM in general are CPU land since many-many decades ago).
This isn't really their job: it's TSMC's job. AMD and other fabless companies can and do contribute significantly but fabless customers don't typically lead or own that kind of research AFAIK. So it shouldn't be an issue being a fast follower without doing as much of that kind of work yourself. That's also partly why Intel seems so far ahead: they are marketing what their fabs/foundries are capable of far in advance of product availability. Finally, NVIDIA or other fabless companies are under no obligation to publish any of their internal testing or research (unless they are required/encouraged to do so for contracts with the DoE/DARPA/etc.)
Both B100 and N100 are very simple straightforward products (big retsized die times n on CoWoS-L).
Are you saying multiple identical dies in a 2.5D package ala Apple or AMD MI250X (but working as a single GPU ala MI300X)?
If we're only talking about the AI flagship, and it's 2.5D rather than 3D, then sure, maybe. That sounds like ~1500W of power to me though (compared to Hopper which is 700W on 4N for a single retsized die) which isn't realistic. So it'd have to be significantly undervolted with lower clocks than H100. I stumbled upon this article which claims 1000W for B100:
https://asia.nikkei.com/Spotlight/Supply-Chain/AI-boom-drives-demand-for-server-cooling-technology
You'll have to forgive me if I don't share your confidence and don't really believe you, but I have been around this circus enough times in the last 20+ years around major GPU architecture releases (e.g. NV30/NV40/G80/Fermi/Volta/etc. and R420/R600/GCN/etc. which were much bigger steps than the ones in-between) and the rumours are nearly always completely wrong, even more so than usual. So I'm not going to take anything too seriously at this stage (and you definitely shouldn't take anything I say too seriously either!)