So you think there is a product line that cannot be used?
drive-by to say that although I left NV in July, I was the CUDA SW lead for Titan for a long time (over a year). B3D taught me well (this is the second #1 machine I was heavily involved with, I worked on Tianhe-1A as well)
also, it should be noted that it's hard to get efficiency close to BG/Q on any x86 machine--even Jaguar was running at 75% efficiency.
No such thing as "fully optimized"...So when I said "unoptimized", I meant "not fully optimized yet".
Since Kepler has some notable new compute features aimed at improving efficiency relative to Fermi, I wouldn't be surprised to see some efficiency gains later down the road from Titan as the programming team learns how to take better advantage of these new features.
Why I left: lots of reasons, but the main one is that at this point I don't think HPC is where I want to spend the majority of my career. It was an unbelievable first job to have in terms of advancement and learning opportunities, but it was time to move on.Why did you leave Nvidia? Where are you at now?
SC#52, Discover comes with Xeon Phi and has an efficiency of 66,3% and about 1,93 GFLOPS/watt.
http://www.top500.org/system/177993
Intels own Endeavor (#57) weighs in at 75,5% efficiency but only 1,26 GFLOPS/watt.
http://www.top500.org/system/176908
There's another K20-equipped system at #90 called Todi: 69,7%; 2,25 GFLOPS/watt.
http://www.top500.org/system/177472
But what's wrong with AMDs FirePro? There's only one system in the Top100 and it's efficiency is an abysmal 23 percent with no power figure given.
http://www.top500.org/system/177996 -> see here for additional info: http://forum.beyond3d.com/showpost.php?p=1679422&postcount=4172. Seems like the entry was borked at the time of me posting.
When running Linpack, the Titan system overall achieves ~ 2.1428 GFLOPS/w, but GK110 by itself achieves ~ 7 GFLOPS/w (per Bill Dally @ NVIDIA in his SC12 presentation).
The data in the Top500 list is still wrong as said in the SI thread already. The cluster has a theoretical peak of just 674.7 TFLOP/s and runs with a Linpack efficiency of 62.4%. The predecessor of that cluster (using Cypress cards) attained a Linpack efficiency of 69,6% iirc. The difference is that they put now two much faster cards in a single node (instead of a single Cypress card) and the network basically stayed the same potentially starting to limit a bit in comparison. The efficiency of the DGEMM kernels on the GPU is in both cases close to or even above 90%.With respect to the King Abdulaziz supercomputing system, the data has been revised/updated to include power consumption. The Linpack performance/watt for the system is quite good (relatively speaking), but the Linpack efficiency is still really low, at ~ 38.4% of the theoretical peak performance. Any thoughts on why the efficiency would be so low?
The core count is also completely wrong. Top500 says there are 33600 accelerator cores and 38400 in total, while there are in fact 3360 CPU cores and 840 GPUs (on 420 S10000 cards). Looks like they should train that copy paste stuff a bit more.That is a pretty big error for Top 500 to make, because the peak system performance goes from 1.098 Petaflops to 0.6747 Petaflops.
I wonder where the line is drawn, to get that incredibly precise 2.1428 figure.
There's the power used by the rack cabinets themselves (maybe 208V DC or some combination of DC voltages). Then power used by the transformers/PSU that generate that supply from whatever comes to the building. The UPS batteries too, and then cooling that big server room.
BMO Capital’s Ambrish Srivastava, who has a Market Perform rating on both AMD and Nvidia, AMD’s market share in desktop computers in the quarter decline from 40.7% in Q2 to 35.7%, while Nvidia’s rose from 59.3% to 64.3%. In notebook computers, AMD’s share fell more dramatically, from 44.8% to 34.2%, while Nvidia’s share rose from 55.2% to 65.8%.
Maybe it just denotes the board or cooling variation. I've seen an air cooled K20 card bearing the designation of K20SUC and for the K10 there is also a version named K10-RL-SUC (specifying an aircooled version with the airflow from the left to the right, howsoever that is defined). There are probably a few versions around with different cooling layouts or even coming without a heatsink (for watercooled installations).What is K20C?