NVIDIA has cut nearly in half the performance of their OpenCL drivers when has introduced OpenCL 1.1 support over a year ago. The problem has never been fixed. I don't know but I would define this as a driver problem (i.e. not a problem of the benchmark).
Yes, I agree, this is a driver problem. As I said earlier: evaluating architectures on a class of workload based on a single benchmark, with a known driver problem is silly. Just like evaluating Cayman "games" performance based on Civ5 performance with an old driver - it's not predictive of much.
Cycles (Blender GPU path tracer) is written in CUDA but it runs faster on the 580 than on 680 too.
Do you need more proof ?
Proof of what? That there exist applications which run slower on 680 than 580? I never said otherwise, especially if you take an application optimized for one architecture and run it on another, as Cycles did. To get optimal performance will require retuning the code. This problem is not specific to Nvidia.
I do, however, object to the statement that ray tracing is a workload that generally runs slower on a 680 than a 580. That's not true, as I pointed out with the existence proof of a ray tracer that runs significantly faster on GK104 than GF100.
Timo works for NVIDIA and he compares the 680 to the 480 in the results presented.
Sure (notice I said GF1
00, not GF110), but the performance improvement from the 480 to the 580 is just clock speeds. Go ahead and adjust for them, you'll see the 680 is still beating the 580.
Not only that, but Timo's results are for the computationally most difficult part of ray tracing - they don't include shading, which is more regular and benefits more from Kepler's denser architecture, as seen in gaming benchmarks. So if you made a complete ray tracer out of Timo's engine, you'd likely see a bigger speedup over Fermi than his report indicates.
I'm not claiming Kepler is a wonder architecture with magic pixie dust that makes everything faster. Kepler definitely is a retrenchment towards traditional GPU workloads, away from Fermi's generality. But also, you should remember: you're comparing a mid range 104 part with a high end 100 part - to discuss the architecture changes it would make more sense to compare GK110 with GF100. Hopefully we won't have to wait much longer to learn about GK110.