NVIDIA Kepler speculation thread

I've just seen a piece of news on hardware.fr
Things are more complex when you get slightly lower end : 675M and 670M were running the Fermi GF114, MX versions are GK104 (960SP 256bit) and GK106 (960SP 192bit) respectively.

So "MX" is there because the branding was a mess already.
 
So "MX" is there because the branding was a mess already.
Exactly. Nvidia shouldn't have brought Fermi over to the 600 series any more than AMD should have VLIW-parts in their HD7000 line-up.

I am really not sure if this is really a net win from uniformed people just aiming for the higher model no. against people who do follow this kind of debates and who mostly are, after all, multipliers because other people come to them for advice.
 
18,688 NVIDIA® Tesla® K20 GPU accelerators

http://nvidianews.nvidia.com/Releas...omputer-For-Open-Scientific-Research-8a0.aspx

Titan, the world's fastest open-science supercomputer,(1) was completed this month at Oak Ridge National Laboratory in Tennessee, opening new windows of opportunity into the exploration of some of the world's toughest scientific challenges. Titan's peak performance is more than 20 petaflops -- or 20 million billion floating-point operations per second -- about 90 percent of which comes from 18,688 NVIDIA® Tesla® K20 GPU accelerators. These are based on the NVIDIA Kepler™ architecture, the fastest, most efficient, highest-performance computing architecture ever built.
 
2 months after the initial delivery and annonce of the titan using K20, they have finish deliver the chips..

Now they can start produce K20 for other :smile:
 
So now we know where all the GK110s went :oops:

They, of course, might release it for desktop but basically they make huge margins on GK104, and without any performance pressure from the other party, then they will simply settle as much as they can with this situation. Even indirectly (or directly, who knows), both parties agree to keep the situation as it is.
 
Don't forget there are just as many AMD 62XX Opterons in there.

http://www.cray.com/Products/XK/XK7.aspx
http://www.cray.com/Assets/PDF/products/xk/CrayXK7Brochure.pdf

Is it a stretch to expect to see those Tesla GPUs replaced by fully fleshed out HSA AMD GPUs in the somewhat near future?

If these designs didn't have the sort of lead times they do, the more likely worry would have been whether Cray could ditch the Opterons.
Cray has had a long run of problems counting on AMD not to trip over its own feet, and Bulldozer is just the latest example.
 
According to HPCWire, the Teslas run at 732 MHz. Taking the claimed 27 PFlop/s theoretical peak and factoring in the CPU flops (2,63PFLOP/s for base clock, would be 2,99 PFlop/s for all core turbo clock), it is a 14 SMx GK110 version.

14 SMx
732 MHz
6 GB GDDR5 @ 384Bit memory interface (clock still unknown)
1.31 TFlop/s theoretical peak

It's probably the top version of the K20 line if one compares it with the one that was on sale in a shop (which had only 13 SMx and a 320Bit memory interface and only 1.17 TFlop/s peak).

What I find interesting, is that the XK6 nodes obviously gets upgraded to XK7 ones in the process. Originally it was planned that the Tesla cards are just drop in extensions for the new XK6 nodes (it should have worked as Jaguar just got upgraded from XT5 to XK6). What is strange, is that HPCWire claims a max power consumption of 12.7 MW (up from prior 10.8 MW afaik), while the specs from Cray say only 54.1 kW per rack, same as with XK6. No idea what to make out of that.
 
Last edited by a moderator:
I believe Cray's reason to use AMD is Hypertransport, it makes sense to use it for I/O. Intel QPI could be usable but HT is much more of an industry standard.
 
Don't forget there are just as many AMD 62XX Opterons in there.

http://www.cray.com/Products/XK/XK7.aspx
http://www.cray.com/Assets/PDF/products/xk/CrayXK7Brochure.pdf

Is it a stretch to expect to see those Tesla GPUs replaced by fully fleshed out HSA AMD GPUs in the somewhat near future?

In reality, Opteron are most used since a lot of time by Cray and for this type of HPC . maybe just the cost / cores is lower.

@Blackowicz, i think you have a nice point there.. yes HT is surely more usable for Nodes
 
Last edited by a moderator:
I believe Cray's reason to use AMD is Hypertransport, it makes sense to use it for I/O. Intel QPI could be usable but HT is much more of an industry standard.

Is that why 70% of the top500 machines are based on QPI Xeons? It's doubtful that QPI vs HT is of any consequence in these highly custom designs. The reason is probably something far simpler.

Incidentally, the Opteron processors used in the system are dual-chip CPUs based on the Bulldozer microarchitecture. We asked Sumit Gupta, General Manager for Tesla Accelerated Computing at Nvidia, why those CPU were chosen for this project, given the Xeon's current dominance in the HPC space. Gupta offered an interesting insight into the decision. He told us the contracts for Titan were signed between two and three years ago, and "back then, Bulldozer looked pretty darn good."

http://techreport.com/news/23808/nvidia-kepler-powers-oak-ridge-supercomputing-titan
 
Is that why 70% of the top500 machines are based on QPI Xeons? It's doubtful that QPI vs HT is of any consequence in these highly custom designs. The reason is probably something far simpler.
Like that Cray's interconnect processors (all the Seastar variants from XT3/4/5 as well as Gemini used in all the XT/XE/XK 6 und 7 nodes) have a HT interface? Being able to put them on a board where it connects directly to the CPUs appear like a good reason to me, if you want to connect 10s of thousands of nodes with minimal latency and maximal bandwidth.

Btw,. as the CPUs moved the PCI-Express interface on die, Cray announced already that the next generation of their interconnect processors will use PCI-Express. So Cray will be able to use AMD and intel in a more flexible way in the future.
 
Last edited by a moderator:
Like that Cray's interconnect processors (all the Seastar variants from XT3/4/5 as well as Gemini used in all the XT/XE/XK 6 und 7 nodes) have a HT interface? Being able to put them on a board where it connects directly to the CPUs appear like a good reason to me, if you want to connect 10s of thousands of nodes with minimal latency and maximal bandwidth.

Btw,. as the CPUs moved the PCI-Express interface on die, Cray announced already that the next generation of their interconnect processors will use PCI-Express. So Cray will be able to use AMD and intel in a more flexible way in the future.

How does any of that refute the empirical data of Xeon vs Opteron usage in supercomputers? If HT was a determining factor Interlagos would have more than 3% penetration in the top500.
 
If these designs didn't have the sort of lead times they do, the more likely worry would have been whether Cray could ditch the Opterons.

Cray has had a long run of problems counting on AMD not to trip over its own feet, and Bulldozer is just the latest example.

This will happen. Either Intel or Nvidia's Project Denver will take the CPU socket. AMD's days in HPC are coming to an end.

With Nvidia's High End Project Denver I can see an all Nvidia Titan Upgrade with a Denver CPU/GPU fusion CPU and a Maxwell GPU.

Dealing with a single vendor would be a plus for Cray.
 
Btw,. as the CPUs moved the PCI-Express interface on die, Cray announced already that the next generation of their interconnect processors will use PCI-Express. So Cray will be able to use AMD and intel in a more flexible way in the future.
AMD is lagging here as well. Has there been a disclosed high-bandwidth socket that also has PCIe provisioned? Failing that, is there a socket with PCIe that also has multiprocessor capability to at least provide aggregate memory bandwidth?
 
Back
Top