If my calculations are right, a 32-CU chip would have:
— 4×64kB (vec regs) + 64kB (LDS) + 16kB (R/W data L1) + 8 kB (scalar regs) = 344 kB per Compute Unit,
— 16 kB (shared L1) + 32 kB (shared iL1) = 48 kB per CU Array,
— Probably 512 kB of L2.
That's a total of 32×344 + 8×48 + 512 = 11,904 kB or 11.6 MB of internal memory, i.e. registers + cache. That's quite a lot, and if I recall correctly, Fermi has about 4 MB.
Edit:
Actually, Fermi has:
— 128 kB (vec regs) + 64 kB (L1) = 192 per SM,
— 768 kB of L2.
That's a total of 16×192 + 768 = 3,840 kB or 3.75 MB of internal memory.
— 4×64kB (vec regs) + 64kB (LDS) + 16kB (R/W data L1) + 8 kB (scalar regs) = 344 kB per Compute Unit,
— 16 kB (shared L1) + 32 kB (shared iL1) = 48 kB per CU Array,
— Probably 512 kB of L2.
That's a total of 32×344 + 8×48 + 512 = 11,904 kB or 11.6 MB of internal memory, i.e. registers + cache. That's quite a lot, and if I recall correctly, Fermi has about 4 MB.
Edit:
Actually, Fermi has:
— 128 kB (vec regs) + 64 kB (L1) = 192 per SM,
— 768 kB of L2.
That's a total of 16×192 + 768 = 3,840 kB or 3.75 MB of internal memory.
Last edited by a moderator: