For example if people wanted to see more architectural improvements with Blackwell what do they mean specifically and what do they hope that would have translated to in terms of the end product?
We’ve typically distilled architectural efficiency into perf/w or perf/mm^2 but both those metrics are heavily influenced by process tech. I like to think of it in terms of “can we do better with the same number of transistors on the same node”. It’s an academic thought at best since nobody evolves architectures that way. Until now
I’d like to think though that if Nvidia was forced to design a reticle limited chip on 4N from scratch they could do better than GB202. One problem with that idea is that the software side of things likely puts a limit on how much actual performance you can wring out of any hardware design. Better software and APIs alongside a complementary clean sheet hardware design will almost certainly put GB202 to shame. I don’t think we’re anywhere near peak efficiency.