The price in the number of pads or pins per chip is one problem.
The other is the rise in costs associated with the engineering and material costs associated with handling skew and EMI from that many active signal lines at high clock speeds.
Routing 256 data lines in such a way that temperature variance, PCB and metal quality, electromagnetic interference from other sources and other data lines, and any number of other complex phenomena don't cause the memory to fail to hit timing margins is a non-trivial task.
Keeping things within margins would require more PCB layers, more complex routing, and higher quality or more expensive materials. All of these add to cost, and manufacturing complexity can also cause yield issues with board manufacturing.
Upping the manufacturing cost and engineering work for a product where margins are miniscule is eating costs for the sake of the CPU and chipset manufacturers, who will also be charging the board maker more for the larger chips with more pins.