My guess is no cache on the IO die, the IF link can't handle that kind of throughput, it's sized for RAM/PCI-E interface width and low power. The more efficient approach would be to double the L3 on the chiplet where you have the bandwidth and low latency, in addition to 7nm SRAM scaling and reduction of traffic through the IO die.
Area is another issue. If you look at the Zen1 die shot, the CCXs are ~1/2 of the area so it's not surprising the new IO die is also ~1/2. There's no room in 120mm for the uncore, an additional IF interface for second chiplet, and a cache large enough to make a difference.