NVIDIA Kepler speculation thread

..But probably less "efficient" due to rumored double SIMD length at 96 (3 times GF100..) and no SFU.
No SFU is my personal opinion, haven't really seen any leaks or anything indicating it though, so it might be a unsafe assumption. But otherwise it looks like the SMs should have similar efficiency, just 3x32 alus instead of 3x16 but with no hotclock instead. That should not affect efficiency at all (compared to the gaming Fermi chips - GF100/110 should be more efficient though I'm not convinced it's really all that much at least for gaming). Pretty sure if the "effective simd length" (dropping hotclock doesn't change this) would have changed there would be whitepapers floating around due to the implications for cuda.
I think at this point the "fat SMs" (with even twice more alus, with lots of implications for scheduling etc.) are out of the rumor mill.
Dropping SFUs (if true) would imho not really have that much of a performance impact (which is why I suggest dropping them in the first place...) though the transistor savings wouldn't be that huge neither (as you need to integrate some logic for performing special functions into the normal alus).
(The other reason I think dropping SFUs would be a good idea is because of dropping hotclock/doubling ALUs the SMs will get slightly bigger - and since the number of SMs should be doubled too that's probably a bit too much die area for all these SMs, hence drop SFUs to compensate for the increase due to dropping hotclock. And it might actually be easier to integrate the logic needed for special function operations into non-hotclocked alus, though I've no idea there really.)
 
Probably just a GK104 engineering sample. I severely doubt that GK110 taped out in Q3 11' already.

Yep, GK110 won't likely tape out until (late) next year.

Seriously, why is everyone messing this up? The first high-end chip of the GK series will be GK100, not GK110. Just like the 480 was GF100 and the 280 was GT100. In the current nVidia naming scheme, a chip name is composed of:

Gf1rm, where
f = family id, T for tesla, F for fermi, K for kepler, etc
r = refresh id within the family. 0 for the first series, 1 for the second, etc
m = model id, with smaller = higher performance. 0 for best card, 4 for second, etc.

So GF100 was 480, and GF110 was 580.

Anyone who calls the next nVidia high-end card GK110 has no clue what they are talking about.
 
Yep, GK110 won't likely tape out until (late) next year.

Seriously, why is everyone messing this up? The first high-end chip of the GK series will be GK100, not GK110. Just like the 480 was GF100 and the 280 was GT100. In the current nVidia naming scheme, a chip name is composed of:

Gf1rm, where
f = family id, T for tesla, F for fermi, K for kepler, etc
r = refresh id within the family. 0 for the first series, 1 for the second, etc
m = model id, with smaller = higher performance. 0 for best card, 4 for second, etc.

So GF100 was 480, and GF110 was 580.

Anyone who calls the next nVidia high-end card GK110 has no clue what they are talking about.

That would normally be true but there's rumors that GK100 was cancelled.

Imagine if GF100 was cancelled. GF 104 would have come out as the first Fermi card (similar to how GK 104 is rumored to be first Kepler card) with GF 110 following some 3-4 months after GF 104 (similar to how GK 110 is rumored to be some months after GK 104).

Rumors, of course, but it certainly makes sense. In the case of GF 100 Nvidia had already pushed through that hot batch (or whatever it was called) of GF 100, so they already had chips manufactured. At that point they might as well sell it. If they had not done that I wouldn't be surprised if they'd ended cancelling GF 100 and just waiting for GF 110. At which point the Fermi release schedule would have been remarkably similar to the rumored Kepler release schedule.

Whether it's true or not we won't know for a while, but the predominant rumor is that GK 110 will be the first enthusiast Kepler chip with the inference being that GK 100 was cancelled.

Hence, GK 104 first with GK 110 to follow later and no rumored GK 100 in the release schedule.

[edit] Bah, picky picky picky. Corrected the spelling. Guess I could have just looked at the thread title. :p

Regards,
SB
 
Last edited by a moderator:
It's Kepler wit two letter Es, and Rangers if you read this, AMD's 69xx chip is Cayman not Caymen, you two should trade a letter, give SB an E and he gives you an A back, so I don't have to write things like this on saturday 6am :)
 
Last edited by a moderator:
More from EXPreview (translated): "Sometimes coincidence: the GK104 graphics inadvertently specifications."

700 MHz? That's in contrast to earlier rumors of ~950 MHz…
What if these 700Mhz are "base" clock, with max clock being only power-limited, with expected max sustained clock ~ 900-1000MHz?
This could explain both rumours , the one about 950 and the one about much lower clock.
 
No SFU is my personal opinion, haven't really seen any leaks or anything indicating it though, so it might be a unsafe assumption. But otherwise it looks like the SMs should have similar efficiency, just 3x32 alus instead of 3x16 but with no hotclock instead. That should not affect efficiency at all (compared to the gaming Fermi chips - GF100/110 should be more efficient though I'm not convinced it's really all that much at least for gaming). Pretty sure if the "effective simd length" (dropping hotclock doesn't change this) would have changed there would be whitepapers floating around due to the implications for cuda.
Unless the warp size changes, I doubt there would be any fundamental implications for CUDA. Widening the SIMDs should be transparent to the running kernels and shouldn't break the efficiency. Nvidia has promised in the past not to touch the warp size, no matter how the architecture evolves and changes, since G80.
 
What if these 700Mhz are "base" clock, with max clock being only power-limited, with expected max sustained clock ~ 900-1000MHz?
This could explain both rumours , the one about 950 and the one about much lower clock.

Difference could be one is a sub 225w tesla product and the other is the rumored ~300w high end graphics product... /shrug
 
What if these 700Mhz are "base" clock, with max clock being only power-limited, with expected max sustained clock ~ 900-1000MHz?
This could explain both rumours , the one about 950 and the one about much lower clock.

it make sense with ''turbo'' rumors.. i just hope we can turn it off for the love of unlimited overclocking..
 
mczak said:
I don't quite agree though with CarstenS that it wouldn't be impressive for a successor of gf104. I think more than doubling alu capacity is indeed quite impressive (if they can achieve the clocks do actually exceed the factor 2).

Indeed, Tahiti achieves ~1.5x FLOPs / mm^2 compared to Cayman (but it is also more efficient so more like 1.7x).

GK104 with 1536 CCs and 360 mm^2 die size would achieve vs GF114:

~2x FLOPs / mm^2 @ 775 MHz
~2.3x FLOPs / mm^2 @ 925 MHz
~2.5x FLOPs / mm^2 @ 1 GHz

From what I've seen, we can roughly expect a doubling of transistor density from going 40->28nm alone. real-life-1.95x was the number quoted somewhere around these forums compared to 2.04 in theory. That's a pretty large gain.

Now imagine in an ideal world, you could put double a GF104/b in 28nm without any significant change in die-size. 768 Shaders/128 TMUs with hotclock and for argument's sake lets leave ROPs and memory at 32/256 bits. Now, up the clocks a bit (say, 100 MHz) and you're suddenly in GK104-rumor-territory with 9xx MHz, 1,536 shaders and no hot-clock.

That's the reason for my above notion, which I could rephrase to Kepler-is-not-fairy-dust probably.

The real question is how the changes to Kepler are influencing power consumption and how Nvidia deals with... let's say... spikes. :)
One of my guesses would be, that they could revert to some kind of round-robin scheduling to individual SMs similar to what AMD does, in order to decrease power a bit because data movement is distributed over time. Additionally, dropping the hot-clock would help here in two ways. You'd have more SIMDs to schedule to, i.e. larger windows for distributing power and you'd probably have a little more area to spread the heat, comparing two SMs without hot-clock to one with hot-clock.
 
Last edited by a moderator:
Unless the warp size changes, I doubt there would be any fundamental implications for CUDA. Widening the SIMDs should be transparent to the running kernels and shouldn't break the efficiency. Nvidia has promised in the past not to touch the warp size, no matter how the architecture evolves and changes, since G80.
Maybe we're using different terminology, but how to you increase the SIMD size without also increasing the warp size? It seems like these are functionally the same thing.
 
3dcgi said:
Maybe we're using different terminology, but how to you increase the SIMD size without also increasing the warp size? It seems like these are functionally the same thing.
The warp size is a power of 2 larger than the SIMD size. It gets executed over multiple cycle. On G80, the warp size was already 32. The SIMD size was only 8 (or was it 16?)

On AMD VILW warp size was 64, executed also over 4 cycles. (Still true for GCN, I believe.)
 
Yep, 8 for G80.

What's up with the Fermi mobiles though - 670m and 675m? Thought Asus claimed their new laptops were using "next generation" nVidia GPUs?
 
For interest only, a user named Seronx on the SemiAccurate forums claims to have GK104 and GK100/110/112 specs from an "undisclosed source."

These specs (including: 2x hot clock, GK104: 512 CC, 0.9-1.0 GHz core clock, GK100/110/112: 1024 CC) are quite different from those of most rumors, and the "1536" number is the number of separate threads the scheduler can work with.

Update: There's more later in the thread.
 
Last edited by a moderator:
"undisclosed source" == I won't admit I just pulled these numbers out of my a..?
Technically I think it wouldn't be totally impossible. 64 (4x16) ALUs per SM, with 3x4 TMUs.
 
Back
Top