Hmm, well, I'm well out of my depth, but, actually that sounds like an interesting idea. Instead of offset-stacking them, would it make sense to build two "chips" -- a compute chip, and an interconnect one, and then stagger/stack them:
...........===.........===
======..======..======
this isn't such a crazy idea (minus the staggering):
of course, it's not the sort of thing that's likely to show up in volume production any time soon...