Was there the expectation that it wouldn't serialize accesses to the same location during atomic ops?
That's pretty brutal considering it should be on-die.
What are the grid dimensions? How many warps per block?
Is it definitely coalescing that's increasing performance? Not merely the number of warps concurrently able to do atomics (either or both of number of clusters and number of MCs)?
Jawed
It's serialising memory accesses. If this was fully cached then it would be cheaper.Was there the expectation that it wouldn't serialize accesses to the same location during atomic ops?
Hmm, interesting. It looks like Nvidia's implementation of atomic add works in such a way that repeatedly accessing the same address takes longer than accessing different addresses and the latency is roughly equal to their memory latency.
They performance here is actually opposite of what you would expect in a cpu based system. It looks like Nvidia is fully serializing the accesses to the same address but in the case of different addresses, is issuing them and then switching warps which effectively hides the latency.
With <=32 warps per multiprocessor (but 24 on 8600GT), that's 4 blocks per multiprocessor. At any one time there are 4 blocks per multiprocessor * 30 multiprocessors = 120 blocks in flight.Grid was {512,1} blocks and {16,16} threads per block (8 warps, 256 threads). I think this should have been enough to saturate both cards.
Also I'm not sure how CUDA distributes blocks to cores. So likely this would also effect how many atomic operations had the same memory segment collision (in my test).
Well, it says "less than 490mm^2" isn't it...2400m at 490mm^2 will still be at a significant trans/mm^2 disadvantage (based on RV740).
Hint: GT200b is 490mm^2. So it's more like "less than GT200b" really.If it was less than 480mm^2, surely it would have read < 480mm^2. Let's split the difference and say 485mm^2.
How's that? We don't know anything about G300's architecture at the moment.From a performance/mm^2 they appear quite similar.
Should I really have to do this?ninelven said:if we are to believe the specs...
Since you apparently know GT300's die size why don't you go ahead and post it?DegustatoR said:GT200b is 490mm^2. So it's more like "less than GT200b" really.
Later the article raises an eyebrow over the single-PCB GTX295 that's coming. Yes, it definitely would be interesting if NVidia launched a "GTX395" concurrently with "GTX380", using a single board for 2 GPUs.GT300 packs 512 MIMD-capable cores and yet it uses "just" one billion transistors extra. I'll be first to admit that I wondered how GT300 packs at least three billion transistors, but according to our highly confidential source, the 2.4 billion transistors are packed in just 495mm2.
But, if a single board is so good, why hasn't NVidia done it already?
FWIW, I wouldnt take Theo's or hardware-infos info. They've been wrong on too many occasions and according to CJ are just taking stabs in the dark, emailing him asking for confirmation on their guesses.
Maybe CJ can enlighten us what is going on in the background?