Benetanegia
Regular
but I don't see how they can cut back on the tensor cores with this SM architecture in a way that saves die space.
Probably just like they replaced tensor cores for FP16 units in GTX vs RTX Turing chips, they just can. They'll just replace them with simpler tensor cores this time around.
But it looks like load/store throughput and L1 cache has doubled compared to Turing SM, so that should lead to some IPC gains.
IMO they'll shrink it to 128KB most likely, as Turing got reduced to 96KB from 128KB on Volta. Still a nice improvement IMO, tho not keeping those 192KB will likely make game/shader programmers cry, lol.
Another thing that is likely to go is the massive 40MB (48MB? on full die?) L2 cache. 12MB I can see happening.
Any other thoughts?