On Pitcairn, there are actually 1280 small register files (4kB in size), each with a bandwidth of 16 Bytes per clock (Hornet's number was too low
). There are 20 LDS arrays of 64 kB (consisting of 32 banks of 2kB) with a bandwidth of 4 bytes per bank and clock (up to 128 byte per clock per LDS). There are 20 vector L1-D caches, each delivering up to 64 bytes per clock. There are 6 scalar L1-D caches (working also as constant caches), each client (3 or 4 CUs are linked to one sL1-D) can fetch up to 16 bytes per clock (up to 64 Bytes per clock per sL1-D). And finally there are 8 tiles of 64kB L2 cache. Each tile has a bandwidth of up to 64 Bytes per clock.
As you see, there is not a single isolated data structure in a Pitcairn GPU (in neither GCN GPU) which can be read faster than with the 128 bytes per cycle the 32MB eSRAM of Durango is capable of.