Pretty sure data granularity is the size of fetches necessary to reach the full bandwidth on paper. When you need to read small chunks randomly, the actual bandwidth depends on the the I/O width times the prefetch size.
GDDR5: 32 bytes
GDDR5X: 64 bytes
GDDR6: 32 bytes
HBM: edit: actually 32 bytes
That means the worst case benchmark where you read small blocks completely randomly, HBM would end up 8 times slower. GDDR5X would be half. Obviously it doesn't happen like this in real world, because of the cache.
GDDR5: 32 bytes
GDDR5X: 64 bytes
GDDR6: 32 bytes
HBM: edit: actually 32 bytes
That means the worst case benchmark where you read small blocks completely randomly, HBM would end up 8 times slower. GDDR5X would be half. Obviously it doesn't happen like this in real world, because of the cache.
Last edited: