Xbit's review, overclocking performance against Tahiti: http://www.xbitlabs.com/articles/graphics/display/nvidia-geforce-gtx-680_14.html#sect0
Overall 4% faster at 19x12 and 0.2% faster at 25x16. From those numbers, AMD doesnt not be concerned about the GK104 threat in the form of the 680. But given the fact that they are behind in key metrics and Nvidia have yet to launch their flagship, they might be worried, plan Bs and Cs cases being thought of. Perhaps a rejiggle in strategy for Sea/Canary Islands?
I did say other metrics and the "not worrying" part was strictly about GK104's performance.I don't know Arty...quieter, cooler, lower power, lower price. Seems AMD needs to drop about a C-note off the 7970 to "not worry."
Oh, boy! The NV30 evil ghost is lurking again.
I think this is incorrect. Based on anandtech writeup, the correct layout is:16 cuda core*12
I was wrong, Nvidia confirmed 1/24th
Will you and your guys write an article about the new NVENC? Like this one: http://www.behardware.com/articles/...-cuda-amd-stream-intel-mediasdk-and-x264.html
Will you and your guys write an article about the new NVENC? Like this one: http://www.behardware.com/articles/...-cuda-amd-stream-intel-mediasdk-and-x264.html
According to Anandtech GK104 has 8 dedicated FP64 CUDA-Cores per SMX and a 1/24 DP-rate:
http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2
Die size comparison in scale:
Kepler SMX vs. Fermi SM in scale:
Rather than some secret block of 8 FP64 CUDA cores that does not shown on any diagrams, isn't it more likely that one of the vec16 units per SMX can do FP64 at half rate. i.e. one out of the 12 vertical columns of CUDA cores does 1/2 rate FP64.
Not sure I buy the asymetric SIMDs (4 vec32 and 4 vec16). Since the scheduling in the compiler now depends on working with known, deterministic latencies of the instructions it issues, wouldn't the compiler have to fully aware it is scheduling "shorter" execution unit, since the latency of the instruction would be increased by 1 clock? So kinda knowing there are x and y exec units, where y has higher latency? What does that gain you? Is that easier to keep track of, than the 4 schedulers having to issue instructions to up to 6 vec32 SIMDS?
GF100 is the only Fermi GPU with available die-shot in the wild.Is that GF1x0 or GF1x4?