Latency is important for GPU's, too. You need to add on-die cache to compensate for any increase in latency.
In the case of XDR, we're talking about 1, maybe 2, core clock cycles, which is only a few percent more than the DDR latency. (*)
This may sound counterintuitive, but it's almost always a bad idea to increase cache size to reduce
incremental latency: once a cache is already in place, increasing the size will obviously help to reduce the cache miss rate, but you need to increase it significantly for it to have an effect and even then it will never help to reduce the fetch latency of data that's not already in there.
During the system architecture phase, the cache design and sizing is almost always separated from latency mitigation design, unless your system has single threaded components that block on read.
In a multi-threaded system, the way to reduce the impact of latency is to increase the amount of outstanding reads. In practise, this usually requires little more than to increase the depth of the read data FIFO by the amount of the additional latency cycles.
Edit: (*) When I say a few percent, I don't mean as measured on the IO pins of the chip, but the total average latency, as measured from the time the read is issued to the time it arrives at the place from consumption.