From what I understood lately, this sums up the whole dram problem. The major part of the die which is the dram arrays is pretty much the same things for GDDR5, DDR3, DDR4, WideIO, etc... it's the IO, buffer, interface, and signaling which are different, power hungry, inefficient, and needs to be a standard in order to have millions of identical chips produced, competition kicks in, and the cost goes way down. If you change or remove the IO part of GDDR5 it's basically an expensive custom chip.
HBM is similar in performance to what you describe, but it's a better choice because it's a standard. A single layer will be 1024bit wide, and 256GB/s per chip (1066 DDR), or lower power at 128GB/s per chip (would be either 1066 SDR or 533 DDR). There's no reason to make a custom memory chip, the best solution is right there in a JEDEC standard.