So now that CUDA's out I expect a whole bunch of interesting stuff about G80 to be tested. Here's some interesting bits from a first scan of the docs.
Memory sizes:
Confirms that it's 8 double-pumped ALUs, not 16 ALUs.
Shared memory banking:
Number of threads in-flight:
I'm sure there are lots more interesting tidbits once I read the whole thing.
Edit: and some more from the RelNotes:
Talk of a Quadro (not too surprising):
Memory sizes:
The amount of shared memory available per multiprocessor is 16 KB divided into 16 banks (see Section 6.1.2.4);
The amount of constant memory available is 64 KB with a cache working set of 8 KB per multiprocessor;
The cache working set for 1D textures is 8 KB per multiprocessor;
Confirms that it's 8 double-pumped ALUs, not 16 ALUs.
Each multiprocessor is composed of eight processors running at twice the clock frequencies mentioned above, so that a multiprocessor is able to process the 32 threads of a warp in two clock cycles.
Shared memory banking:
In the case of the shared memory space, the banks are organized such that successive 32-bit words are assigned to successive banks and each bank has a bandwidth of 32 bits per clock cycle.
Number of threads in-flight:
The delays introduced by read-after-write dependencies can be ignored as soon as there are at least 192 concurrent threads per multiprocessor to hide them.
I'm sure there are lots more interesting tidbits once I read the whole thing.
Edit: and some more from the RelNotes:
Q: Does CUDA support Double Precision Floating Point arithmetic?
A: CUDA supports the C "double" data type. However on G80 (e.g. GeForce 8800) GPUs, these types will get demoted to 32-bit floats. NVIDIA GPUs supporting double precision in hardware will become available in late 2007.
Talk of a Quadro (not too surprising):
The release and debug configurations require a GeForce 8800 Series GPU (or equivalent G8X-based Quadro GPU) to run properly.
Last edited by a moderator: