NVIDIA Kepler speculation thread

That would be likely too small for 256bit memory interface. But they could make a G80->G92-style transition once again. 256bit interface, 32 ROPs, 512 SPs, 64 TMUs and die size around 250mm². And they could use it as a mainstream part until they release Kepler-based mainstream GPU.
 
Yes, but I think we would have heard about it by now.
We did find GF117 hidden inside a driver, didn't we? As I said, GF117 being a 28nm GF116 but with a smaller 128-bit bus (with support for faster GDDR5) seems to make sense. Remember the main advantage of 28nm in the short-term is power rather than cost (lower yields & higher wafer costs) plus capacity is limited - so it makes sense to create a GPU that's perfectly suited to the OEM notebook market first and try to get some Ivy Bridge design wins with it. Of course, I could be wrong, or even if I'm right that doesn't mean NVIDIA will manage to deliver it in time.
 
it makes a lot of sense that way. and it's rumoured to have no video output built-in (I read that here), you pair it with the IGP.
 
If we work backwards from ORNL's claim of 20PF and the leaked Bulldozer clock speeds of ~2.3Ghz for the 16-core parts the peak DP flops on Kepler doesn't look so outstanding.

Interlagos: 2.3Ghz * 200 cabinets * 96 CPUs * 8 FPU * 4 FMA =~ 2.8PF.
Kepler: 17.2PF / (200 cabinets * 96 GPUs) =~ 0.9 TF per GPU.

The Fermi based x2090 is already at 0.66 TF per GPU.
 
if the goal is 20 sustained petaflops, that would be quite an interesting figure.
the Tian-He computer only does half peak rate.
 
if the goal is 20 sustained petaflops, that would be quite an interesting figure.
the Tian-He computer only does half peak rate.

Yeah, at first I suspected that too but it doesn't seem to be the case - they're talking 20PF theoretical. The same presentation advertised a 9x speed increase. If you look at the Top500 numbers 20TF is 9x Jaguar's Rpeak, not Rmax.
 
Yeah, at first I suspected that too but it doesn't seem to be the case - they're talking 20PF theoretical. The same presentation advertised a 9x speed increase. If you look at the Top500 numbers 20TF is 9x Jaguar's Rpeak, not Rmax.
Yes, they mention 10-20 PF peak as system goal. The 10 PF figure probably applies if they put only Fermi-Teslas (GF100 based 2050 or 2070) in.
 
If we work backwards from ORNL's claim of 20PF and the leaked Bulldozer clock speeds of ~2.3Ghz for the 16-core parts the peak DP flops on Kepler doesn't look so outstanding.

Interlagos: 2.3Ghz * 200 cabinets * 96 CPUs * 8 FPU * 4 FMA =~ 2.8PF.
Kepler: 17.2PF / (200 cabinets * 96 GPUs) =~ 0.9 TF per GPU.

The Fermi based x2090 is already at 0.66 TF per GPU.

What do you think this snipet from p.11 means?
"2nd
socket
in
each
XK6
board
populated
with
Kepler
GPU"
 
What do you think this snipet from p.11 means?
"2nd socket in each XK6 board populated with Kepler GPU"
It think they talk about the second PCI-Express-Slot. The GPU is connected to the GPU via PCI-E 2.0 according to the pdf. No idea what is supposed to be in the first slot, as the whole interconnect between the nodes goes over a HT connection.
Or does anybody think Cray integrated a custom (probably quite large) PCI-Express-Socket (maybe with additional power supply lines) and nvidia builds a custom socketed Kepler board just for this?
 
We did find GF117 hidden inside a driver, didn't we? As I said, GF117 being a 28nm GF116 but with a smaller 128-bit bus (with support for faster GDDR5) seems to make sense.
Can't see why it would even need faster gddr5 memory. It's not terribly bandwidth limited with 128 bit gddr5 today, mobile parts are lower clocked and use (probably) only 1.35V gddr5.
 
Can't see why it would even need faster gddr5 memory. It's not terribly bandwidth limited with 128 bit gddr5 today, mobile parts are lower clocked and use (probably) only 1.35V gddr5.
Ah yes, good point about mobile parts being clocked lower for both core and memory. Obviously I'd expect any Fermi derivative on 28nm to be able to clock higher than GF116 on 40nm though, so you'd ideally still want to support faster GDDR5 for the desktop SKUs. That doesn't mean they'll bother though.
 
It think they talk about the second PCI-Express-Slot. The GPU is connected to the GPU via PCI-E 2.0 according to the pdf. No idea what is supposed to be in the first slot, as the whole interconnect between the nodes goes over a HT connection.
Or does anybody think Cray integrated a custom (probably quite large) PCI-Express-Socket (maybe with additional power supply lines) and nvidia builds a custom socketed Kepler board just for this?

I think the GPU is soldered onto the motherboard in this case.
 
If we work backwards from ORNL's claim of 20PF and the leaked Bulldozer clock speeds of ~2.3Ghz for the 16-core parts the peak DP flops on Kepler doesn't look so outstanding.

Interlagos: 2.3Ghz * 200 cabinets * 96 CPUs * 8 FPU * 4 FMA =~ 2.8PF.
Kepler: 17.2PF / (200 cabinets * 96 GPUs) =~ 0.9 TF per GPU.

The Fermi based x2090 is already at 0.66 TF per GPU.

Do they usually use top-bin CPUs—and GPUs, for that matter? That would obviously yield the highest performance, but at the expense of perf/W. Since power consumption is a huge deal in HPC, I'm not sure what trade-offs are typically favored.
 
What do you think this snipet from p.11 means?
"2nd socket in each XK6 board populated with Kepler GPU"

Not sure. Each XK6 rack supports 4 CPUs and 4 GPUs. The GPUs are soldered to the board. They are probably talking about one compute node (1 CPU + 1 GPU).

Do they usually use top-bin CPUs—and GPUs, for that matter? That would obviously yield the highest performance, but at the expense of perf/W. Since power consumption is a huge deal in HPC, I'm not sure what trade-offs are typically favored.

Well we're comparing Fermi Tesla v Kepler Tesla which are both HPC parts. Even if all 20PF were credited to the GPUs it wouldnt be much of a jump over Fermi.
 
Back
Top