NVIDIA Kepler speculation thread

From those numbers alone, you could arrive at a mere 0,9 TFLOPS DP for each Tesla, if I did my math right (18000, 85% of peak including some M2090s alright).

But: You say they only mention potentially over 20 PFLOPS peak as if it where something new. It isn't at least since last years SC.
They said, they will arrive somewhere in the range of 10 to 30 PFlops depending on the funding. With the BD CPUs delivering about 3 PFlops, you can do the math either for 19,200 Kepler GPUs and the 30 PFlops, or you take the newer numbers with 18,000 Kepler GPUs and "over 20 PFlops". Anyway, you arrive north of 1 TFlop per GPU. Everything else wouldn't look like real progress compared to the M2090, isn't it?
 
I haven't been able to find any figure stating 30 PFLOPS when I was looking it up yesterday evening. Instead, even the booth presentation from SC`2010 only talked about a 10-20 PFLOPS target for OLCF-3 for 2012 timeframe.
SC10_Booth_Talk_Bland.pdf said:
ORNL’s “Titan” 20 PF System Goals
• Initial 1 PF delivery in 2011, final 20 PF system in 2012
[…]
• 20 PF peak performance
• 9x performance of today’s XT5​
You don't happen to have a link at hand with 30 PFLOPS that is not totally outdated, i.e. before Nov 2010?


Let's do the math based on the upper bound, 20,000 TFLOPS. According to the Nvidia Press Release, Titan's peak throughput will be achieved to 85% via GPUs (and let's assume for the sake of the argument, that this figure is correct), resulting in 17,000 TFLOPS from the GPUs.

If there are 19,000 or more X2090 in there, each of them should be less than a TFLOP, right? According the OLCF-3 Timeline there will be 18,688 nodes, which will have to provide 17,000 TFLOPS, 0,909... TFLOPS per GPU.


edit: According to Oak Ridge LCF, the numbers range from 7,000 to 18,000 GPUs with 10-20 PFLOPS, which would give 0,94 - 1,21 TFLOPS DP for the GPUs contributing 85% to peak perf and NOT counting the 960 Tesla M2090. Are the to be replaced or not? I don't know.
 
Last edited by a moderator:
From those numbers, it seems to infer that kepler is a 768 shader fermi refresh. I suppose they are going for a smaller die this time around, similar to ATI?
 
Last edited by a moderator:
You don't happen to have a link at hand with 30 PFLOPS that is not totally outdated, i.e. before Nov 2010?
I linked an interview from the 24th of July 2011 earlier in the thread already ;)

And they are only putting 960 X2090 in, nowhere near 19,000. I suspect (the interview I linked to explicitly says so), that they won't fully populate the whole Cluster with Kepler GPUs. If they would, they could arrive at 30 PFlop/s. But that is probably not going to happen.
 
From those numbers, it seems to infer that kepler is a 768 shader fermi refresh. I suppose they are going for a smaller die this timer around, similar to ATI?

i feel same way too.. total speculation here; if we take Charlie's rumour is right and HD7970? will come up HD6990(1.7x6970) or a bit slower level performance.. maybe nvidia dont need to push 1024sp but GTX680? with 50% more shader & 10% more clock should be enough and still 15% faster than 7970.. and die size would be a bit less or more ~400mm2 if they push everything +50%..
 
i feel same way too.. total speculation here; if we take Charlie's rumour is right and HD7970? will come up HD6990(1.7x6970) or a bit slower level performance.. maybe nvidia dont need to push 1024sp but GTX680? with 50% more shader & 10% more clock should be enough and still 15% faster than 7970.. and die size would be a bit less or more ~400mm2 if they push everything +50%..

Yeah, I'm thinking more in the 350ish range, and I'm guessing AMD will target a smaller die again, ala rv770, in the 250-300 range. I would guess NV shrinking is a side effect of the whole tesla shader dropping/performance dropping problem with fermi. I wouldn't be suprised if AMD/NV end up more neck and neck this go around, depending on how GCN turns out.
 
And they are only putting 960 X2090 in, nowhere near 19,000. I suspect (the interview I linked to explicitly says so), that they won't fully populate the whole Cluster with Kepler GPUs. If they would, they could arrive at 30 PFlop/s. But that is probably not going to happen.

According to the Press Release they target between 10 and 20 PFlop :http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=1615561
When completed, the Titan system will have a peak performance between 10 and 20 petaflops
.
 
According to the Press Release they target between 10 and 20 PFlop :http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=1615561
.
your linked press release said:
The second phase of the contract -- equipping the system with NVIDIA Tesla GPUs based on the next-generation architecture code-named "Kepler" -- is expected to be completed in the second half of 2012. The contract includes additional upgrade options beyond these two phases that, if exercised, would increase the total value of the contract [and probably performance by adding more GPUs of course ;)].
I see no contradiction to the earlier interview I linked which stated, that it is just a a matter of funding where they will arrive in the 10 to 30 PFlop/s window. The only thing which is quite probable in my opinion, is that the Titan cluster will not be fully equipped with Kepler GPUs. The official website of the ORNL for Titan also says the final number of Kepler GPUs still needs to be determined.
 
Last edited by a moderator:
I linked an interview from the 24th of July 2011 earlier in the thread already ;)
Thanks.
And they are only putting 960 X2090 in, nowhere near 19,000. I suspect (the interview I linked to explicitly says so), that they won't fully populate the whole Cluster with Kepler GPUs. If they would, they could arrive at 30 PFlop/s. But that is probably not going to happen.
They are putting 960 M2090 in this fall, that's what I wrote, yes.
 
Thanks.

They are putting 960 M2090 in this fall, that's what I wrote, yes.
Actually you wrote about 19000 X2090 cards (probably you wanted to refer to the Kepler GPUs, which will get another number). The 960 X2090 cards get in there now, as M2090s don't fit in the XK6 blades, see here. ;)
 
Last edited by a moderator:
Actually you wrote about 19000 X2090 cards (probably you wanted to refer to the Kepler GPUs, which will get another number). The 960 X2090 cards get in there now, as M2090s don't fit in the XK6 blades, see here. ;)

Yes, thanks for the correction!
 
I could imagine ORNL talking Linpack flops, while nvidia the (very) theoretic maximum?
As nvidia is talking about "over 20 petaflops peak" in their press release and the interview mentioned a maximum of 30 PFlop/s peak for the full configuration, there is not that much of a difference (and the official design goal is 20PFlop/s peak). Furthermore, nv promised a better ratio of sustained flops to peak than with Fermi (which gets about 65% max for a single node, more like 50-55% for larger clusters) for the Kepler generation. That could be realized with larger register files and/or more bandwidth to the L1 cache/shared memory, as these are currently the limiters (a faster interconnect between the nodes could only approach the 65% seen with a single node in the best case).
 
Back
Top