nVidia on-chip cache ?

McElvis

Regular
Anyone got any ideas (if its true of course...) ?

Our chap took a large amount of graphic data and sent it directly to the graphics processor (GPU), just to see what might happen.

What you would usually expect is that the GPU would send data into memory while running its calculations. The chip ordiniarily keeps some data in graphics memory, since the memory can remember and keep some interstates of calculations, before returning it to the GPU to get a final result.

But the result of the experiment was quite surprising; at least it surprised us a lot. Our investigator didn’t get any data transfer from chip to memory and there where no interstates. The result came straight from the chip.

There is only one logical conclusion we can draw from this experiment.

If you don't get any calculation interstates then you can say that the specific GPU must have some kind of buffer - some memory built into the processor. So, Geforce GPUs have some amount of memory inside the chip that is used as some kind of buffer and maybe as a cache as well.

Many Nvidia chaps we have talked with have said that we have been misled and there is no memory buffer or cache on the processors themselves, but we heard this sort of denial many times before.

Father of all Geforce cards, Nvidia chief scientist David Kirk, confirmed to us that Geforce 4 does have memory inside the chip for caching certain things but he said that there is only few Kb of it (16 Kb, we think he said). But I would say that 16K is not enough. I don’t doubt that there is 16 Kb for some calculations but you need more than that for some serious calculations -- like the 1Mb used in the experiment here.

http://www.theinquirer.net/17060213.htm
 
Of course there is an on-chip cache.

Surprisingly, 16KB-32KB is sufficient for caching texture data. More cache will probably be necessary as the number of textures bound at once increases; however, you do not need anywhere near as much texture cache as you need data cache on PCs to realize similar performance benefits.
 
IIRC, Nvidia claims that 78% of the transistors in Geforce4 are logic, presumably leaving the rest for cache/buffer SRAMs. With a total of 63 million transistors, that would be about 13.8 million transistors' worth of SRAM. Assuming 8.5 transistors per bit (a typical number for dual-port SRAM cells + some overhead), we end up at ~200 KBytes of on-chip SRAM. Now, exactly what this SRAM is used for is unknown (and Nvidia presumably won't tell us), so any guesses on what they are must be speculative of nature. Much of it is probably texture cache, which would need to be replicated up in order to feed that many pipelines/texture units; there would likely be vertex and framebuffer caches in the design as well, and probably a bunch of other stuff only known to Nvidia themselves.
 
I think a 1Mb cache would be slightly overkill, the Pentium Pro (if I remember correctly) had 1Mb of cache and Intel were having real problems with the amount of chips they lost due to errors during fabrication. The Inq. needs to change its method of testing methinks :)
 
BoardBonobo said:
I think a 1Mb cache would be slightly overkill, the Pentium Pro (if I remember correctly) had 1Mb of cache and Intel were having real problems with the amount of chips they lost due to errors during fabrication. The Inq. needs to change its method of testing methinks :)

Gamecube has a 1MB texture cache, and it works fine.
 
Oh, I didn't know that. Well, maybe a 1Mb cache isn't beyond economical production anymore but I still think the GF doesn't have enough silicon real estate to include that. But then again I could be wrong.
 
With a transistor budget of >60M, there would easily be room for huge caches in GF4-class designs if Nvidia wanted it - e.g. 30M transistors of 1T-SRAM would amount to about 3.5 MB of memory usable for, say, caching. I still don't think Nvidia has actually done that, since we would probably have heard of it otherwise.
 
BoardBonobo said:
I think a 1Mb cache would be slightly overkill, the Pentium Pro (if I remember correctly) had 1Mb of cache and Intel were having real problems with the amount of chips they lost due to errors during fabrication. The Inq. needs to change its method of testing methinks :)

pentium pro had L2 on different die but on same package.
a rumor says that the two dies could only be tested together,
and if either was broken, both had to be thrown away.

L2 cache size ranged from 256k to 1M, most of the chips were 256k models.
 
Back
Top