Anarchist4000
Veteran
Likely increased given all the new roles for L2. The L1/register size as I recall from the Linux drivers actually shrunk to 1k per wave from 16k. Need to double check that. That would make it likely they're paging the register file from somewhere. They could be deliberately spilling inactive registers to ram or L2 to increase occupancy with a smaller, highly ported RF. Obviously speculating on that, but it would amplify apparent VGPRs and simplify scheduling. Makes sense for long running shaders at least.With halved bus width but almost double clocks, access patterns should be favorable for some workloads as well. Wasn't L2 capacity increased as well?
AMD does have color compression and I'd expect diminishing returns on improving that feature. Especially if AMD foresees the market moving towards compute where it isn't used. Compression there is a programmer implementation.Color compression is not a simple check mark that you either have or not have. Pascal color compression is better than Maxwell compression is better than Kepler compression.
Prefetch is consecutive data words. Most of that compression will be from working around masked lanes. Graphics bandwidth comes from this as SIMDs usually read consecutive data. Zero compression being a significant component of overall compression. From there it's a matter of smaller block sizes mapping better to sparse data. Prefetch of one and zero compression would be pointless as you'd just skip it. Then add in the additional channels on separate tasks to make up the bandwidth.None of those anything to do with BW reduction.
AFAIK, the prefetch size is 32 bytes for both HBM and GDDR5 (and 64 bytes for GDDR5X?) It's just that you trade clock rate for bus width.
Could be pipeline stalls which wouldn't be surprising. Or he had Chill running.The guy that got the FE mentioned throttling (core clock all over the place). Firestrike isn't that demanding in terms of power draw (the non-extreme/Ultra variant) so it is either hitting a low power limit or thermal instability. I'm not sure why the GPU would downclock itself otherwise.