arjan makes good poionts above. Longer bursts provide much higher bus utilization. Unless you only rarely use contigous data, longer bursts are usually a good idea, from a bus efficiency point of view.
For less technical readers, I'd like to point out there is a problem that kicks in earlier than passing the point of optimum bus utilization. Caches. Bursting in extra data is very cheap in terms of bus cycles in and of itself, but it is expensive in that you end up using a larger portion of your caches for storing data nobody wanted. Furthermore, the data that is replaced has to be written back to memory, which costs bus-cycles. Thus, the break-even point for memory bus read burst length is much shorter in "real life" than what would be apparent from only looking at protocol efficiency, because longer bursts also lower cache efficiency and increases cache logic demands on bus cycles.
The trade-off is clearly application and GPU architecture dependent. It is obviously impossible for an interested bystander like me to say what would be an optimum. I would predict that it isn't terribly sharp though, in other words that the finer points only make small differences
Edit: Small addendum. You can increase cache efficiency by tagging data that you know shouldn't be cached, or by manually locking and releasing data in the cache. Would be workeable in for driver writers.
Entropy
For less technical readers, I'd like to point out there is a problem that kicks in earlier than passing the point of optimum bus utilization. Caches. Bursting in extra data is very cheap in terms of bus cycles in and of itself, but it is expensive in that you end up using a larger portion of your caches for storing data nobody wanted. Furthermore, the data that is replaced has to be written back to memory, which costs bus-cycles. Thus, the break-even point for memory bus read burst length is much shorter in "real life" than what would be apparent from only looking at protocol efficiency, because longer bursts also lower cache efficiency and increases cache logic demands on bus cycles.
The trade-off is clearly application and GPU architecture dependent. It is obviously impossible for an interested bystander like me to say what would be an optimum. I would predict that it isn't terribly sharp though, in other words that the finer points only make small differences
Edit: Small addendum. You can increase cache efficiency by tagging data that you know shouldn't be cached, or by manually locking and releasing data in the cache. Would be workeable in for driver writers.
Entropy