Nvidia BigK GK110 Kepler Speculation Thread

So Nvidia's 551mm2 die manages to use roughly the same power as AMD's 365mm2 die, all while outperforming it by 27%? And now they're getting charge $1000 a board, while making presumably hefty margins on the chip itself?

What this tells me is that small dies are not the way to go, and Nvidia's efficiency lead is rather massive. I know this doesn't take volume into account, which is a very important part of the equation, but your average consumer doesn't care about volume. The market will see that Nvidia has a massive lead in performance over AMD, and that perception will trickle down and make their cheaper cards seem like the superior alternative, regardless of whether they actually are or aren't.

So this is a marketing stunt card, really. Where's AMD's marketing stunt card? They're missing out on the limelight.
 
So Nvidia's 551mm2 die manages to use roughly the same power as AMD's 365mm2 die, all while outperforming it by 27%? And now they're getting charge $1000 a board, while making presumably hefty margins on the chip itself?

What this tells me is that small dies are not the way to go, and Nvidia's efficiency lead is rather massive. I know this doesn't take volume into account, which is a very important part of the equation, but your average consumer doesn't care about volume. The market will see that Nvidia has a massive lead in performance over AMD, and that perception will trickle down and make their cheaper cards seem like the superior alternative, regardless of whether they actually are or aren't.

So this is a marketing stunt card, really. Where's AMD's marketing stunt card? They're missing out on the limelight.

Huum marketing wise, i find it is the other way around, anyone who look reviews on the research of a new card, who fall on titan review, but will never put 1000$ on this one cards, fall directly on the 7970Ghz ... who cost half the price ( without even include the 3 games Bundle ).
Its not the poor review made with the first Never Settle drivers who will have make much publicity for AMD cards. I dont really see what AMD should have done with a 1 year card in term of marketing, outside playing on the price.
 
Last edited by a moderator:
I dont really see what AMD should have done with a 1 year card in term of marketing, outside playing on the price.
This isn't about the 7970. It's about Nvidia competing in the razzle-dazzle market segment, while AMD is not.
 
http://www.hardocp.com/article/2013/02/21/nvidia_geforce_gtx_titan_video_card_review/5

In the apples-to-apples test we are running at the High AA setting, which the GeForce GTX TITAN has no trouble delivering a playable experience. Now, it looks like the Radeon HD 7970 GE would be playable with its level of performance, however, the game is laggy and feels choppy despite the framerate showing what would normally be a good level for playability. The actual experience was different than the framerates show, it felt a lot slower than the framerates were showing. TITAN didn't experience this; it was perfectly smooth with no lag.
.
 
The problem I have with these $999 nVidia cards is that they leave too much stuff on the table. 690 only having 2GB per GPU and Titan only allowing 6% increase in power target. I guess it leaves them room to release a 15 SMX part with higher ceiling later on. This price point shouldn't have such compromises imo.
 
It's not going to be any worth-mentioning volume either, so it's rather a moot point.

I think they manufacture as many as they WANT to sell. At this price point.

Have you seen this:

2bqv6t.jpg


http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_Titan/7.html
 
Last edited by a moderator:
So Nvidia's 551mm2 die manages to use roughly the same power as AMD's 365mm2 die, all while outperforming it by 27%? And now they're getting charge $1000 a board, while making presumably hefty margins on the chip itself?

Well by the looks of it 10% of the performance appears to be due to their crafty boost 2 implementation. Let's be generous and say 25% more performance though for 51% larger area.

Obviously the more shaders you have at lower clocks, the better the TDP will be. I'm not saying AMD would be equal with more shaders at lower clocks but clearly the perf/watt disadvantage would be narrowed.

To be honest, in the current environment I believe a $1000 card would sell more by being faster and power hungry. I'm quite certain that given the much smaller than anticipated gap in performance, the 7970 will likely have another record breaking February.

What this tells me is that small dies are not the way to go, and Nvidia's efficiency lead is rather massive. I know this doesn't take volume into account, which is a very important part of the equation, but your average consumer doesn't care about volume. The market will see that Nvidia has a massive lead in performance over AMD, and that perception will trickle down and make their cheaper cards seem like the superior alternative, regardless of whether they actually are or aren't.

So this is a marketing stunt card, really. Where's AMD's marketing stunt card? They're missing out on the limelight.
Don't forget this is 8 months after the GHz edition was released. AMD probably killed a part with 10% better performance and maybe 5-10% better power characteristics because it was pointless. Sure Nvidia has a perf/watt lead but it's not massive by any means.
 
NVIDIA pays little attention to OpenCL, there's no reason Titan should change that.

CUDA is the target.
Rather convenient that you should only look at CUDA results where you can't compare it to competitor's card no? Fact is it seems to lose in both OpenCL and OpenCompute often which frankly is a bit disappointing - for some workloads it looks like it doesn't even win with doubles (sisoft sandra results for example). Optimized drivers or not.
(At least it looks like there's quite some improvement if you're running cryptography-like workloads (more than twice as fast as GTX 680) but it's still only roughly half as fast there as a 7970 GE so still definitely not the card for bitmining...)
When GK104 was sort of weak at compute, everybody was saying well it's not optimized for that. But I don't really see GK110 doing much better there relatively (with the exclusion of the already mentioned cryptography stuff, I'm curious actually why that's running faster), and given the SMX look nearly identical this can't be a surprise.
 

Have you thought about driver-related phenomenon?

AMD probably killed a part with 10% better performance and maybe 5-10% better power characteristics because it was pointless

If they killed it, then what would they launch in Q4? Another chip with +10% performance gain on top the former +5-10%?
 
Register file size per SP flop per SM(X) or CU:

GF100: 128/64 = 2
GF104: 128/96 = 1.33
GK104: 256/384 = 0.67

GCN: 256/128 = 2

Register bandwidth per flop and register count per work-item are more useful measures though. Register count per flop doesn't really tell you much about the architecture's ability to achieve peak utilization.
 
When GK104 was sort of weak at compute, everybody was saying well it's not optimized for that. But I don't really see GK110 doing much better there relatively (with the exclusion of the already mentioned cryptography stuff, I'm curious actually why that's running faster), and given the SMX look nearly identical this can't be a surprise.

As far as I can tell, single precision efficiency is not improved at all on GK110 vs GK104. Double precision gets a massive boost and of course there's the 50% bandwidth increase.

However, there's no reason to believe GK110 will fare better than GK104 in single-precision, math-intensive workloads.
 
Hmmm thanks. Best I've seen is 66%. I'm sure we'll never get a straight answer.

Since I now can talk about it: GK110 is even slightly worse. Best case is about equal, but in medium instruction sequences, there's a slight loss. My numbers are not verified yet, so take them with a grain of salt.
 
Rather convenient that you should only look at CUDA results where you can't compare it to competitor's card no? Fact is it seems to lose in both OpenCL and OpenCompute often which frankly is a bit disappointing - for some workloads it looks like it doesn't even win with doubles (sisoft sandra results for example). Optimized drivers or not.
(At least it looks like there's quite some improvement if you're running cryptography-like workloads (more than twice as fast as GTX 680) but it's still only roughly half as fast there as a 7970 GE so still definitely not the card for bitmining...)
When GK104 was sort of weak at compute, everybody was saying well it's not optimized for that. But I don't really see GK110 doing much better there relatively (with the exclusion of the already mentioned cryptography stuff, I'm curious actually why that's running faster), and given the SMX look nearly identical this can't be a surprise.

Well, once again, Kepler sacrifices registers/cache for flops, and sometimes that hurts performance a good bit. If you rewrite programs to take this under account, you can achieve very good results. Example: Understanding the Efficiency of Ray Traversal on GPUs – Kepler and Fermi Addendum. Clearly, this requires extra work and may not always be possible, but Kepler does have the potential for very high GPGPU performance.

GCN is probably easier to use, but since Teslas are outselling FirePros (to be best of my knowledge) the general feeling must be that NVIDIA's software is better. Whether things will remain that way is another story.

Register bandwidth per flop and register count per work-item are more useful measures though. Register count per flop doesn't really tell you much about the architecture's ability to achieve peak utilization.

But you can't just look at register count per work-item either, you have to look at the total number of work-items as well; and that didn't increase much, not nearly as much as flop power.

Besides, it's just an upper bound, in practice you're often better off trying to increase IPC per work-item rather than increasing the number of work-items (it's not as straightforward, but yields better results). See Vasily Volkov's work, e.g. this: Better Performance at Lower Occupancy. And doing this is not easy if you're short on registers.
 
Tech Report had Titan at 33W more than the GHz Edition -

power-load.gif


It's really almost impossible to tell tbh. Anandtech had Titan almost 50W higher...is anyone doing decent power tests on these?
 
Hardware.fr must have some sort of golden sample. It consumed less than a 680 in one test. Overall, nice performance bump over the 680, excellent power consumption but prohibitive pricing.

Nope it's not a golden sample, it's actually linked to the way boost behaves on Titan.

If I measure power in Anno after 30s I get 220W but the cooling system (including the way it is calibrated) is not able to maintain 80°C with such a power draw (unless the system is in the fridge of course). After 5 minutes power drops to 180W. If I add extra cooling around the board power goes up to 200W.
 
Back
Top