You can do some reverse engineering of GPU designs using GPUBench. There are lots of knobs on the applications to "explore" the characteristics of the hardware, at least as exposed through OpenGL. There is also a DX9 port, but it hasn't been promoted as we have a few bugs in the tests. (For example, neither ATI no Nvidia seem to like our DX branching tests after a spec change in dx9c.).
Now, you can more easily do cache analysis when the caches are not fully associative like the current ATI chips.
Also remember when studying cache designs, and memory systems in general, to take a careful look at the latency behavior.
As was said, the memory system as a whole is setup for lots of references going on at the same time. You can really see this as you start drawing very little in a scene with memory intensive shaders. On the current crop of boards, we don't start getting good efficiency out of the memory or ALUs until we have ~4K fragments being executed.
Now, you can more easily do cache analysis when the caches are not fully associative like the current ATI chips.
Also remember when studying cache designs, and memory systems in general, to take a careful look at the latency behavior.
As was said, the memory system as a whole is setup for lots of references going on at the same time. You can really see this as you start drawing very little in a scene with memory intensive shaders. On the current crop of boards, we don't start getting good efficiency out of the memory or ALUs until we have ~4K fragments being executed.