NVIDIA Kepler speculation thread

I specifically used "" around large and "fairly" to imply that it was meant relatively. GF114 wasn't just a case of enabling something that was already there, but something else also. Nearly 10% difference is fairly large compared to 0%. 0% would only basically be a bios change like opening the shaders on the 6950.

It isn't large in either absolute (30mm) OR relative (10%) terms, despite your quotation marks or misguided references to zero. If you want to be purely technical, any number that isn't zero, when compared to zero, is infinitely larger. It's standard number theory. But that's not what we're talking about, so we're going to ignore that tangent.

Next, and most importantly, GF104 and GF114 are pin-compatible and, according to NV, architecturally identical. There were no new features introduced, only adjustments to transistor types within certain regions to reduce leakage which allows for higher clocks without killing the power budget.
 
Ten percent is quite a bit imo for what they did. They could have just enabled the shaders in the GF104 , but they decided to do extra work to improve the chip.

I dunno what is so misquided in my reference, 0% would have been just fully enabled gf104 and 10% is GF 114. 10% in GPUs can decide a lot.
 
Ten percent is quite a bit imo for what they did. They could have just enabled the shaders in the GF104 , but they decided to do extra work to improve the chip.

I dunno what is so misquided in my reference, 0% would have been just fully enabled gf104 and 10% is GF 114. 10% in GPUs can decide a lot.

The transistor count is the same, the architecture is the same, the pinout is the same. The listed feature set is the same.

The burden of proof is now on you to prove what changed, if anything, as all other external indicators (to include the manufacturer) point to zero change outside of a wider variation in transistor type for the various working parts of the chip.

By the way: simply 'enabling' all the SM's in a GF104 would not allow you to clock it another 15% faster while only adding 10W TDP. So, you tell me, what makes more sense? 10% of your mystery sauce that adds zero transistor count or featureset, or a slightly larger die due to NV retooling slightly to ensure they use the most 'applicable' type of transistor for the high clocking regions?

Occams Razor suggests an answer...
 
I think you are reading something from my posts that isn't there...

I'm not claiming that there is anything else there or done to the chip than what you have said, just some work on the transistors to enable higher clock.

However my point is that the work on the chip (10% bigger die, clocks better) means it's not just "exactly full gf104 with clock bump" which is what you were saying. No there isn't differences in cores etc, but the 10% bigger die and the results it gave are a clear modification regardless.
 
I think you are reading something from my posts that isn't there...

I'm not claiming that there is anything else there or done to the chip than what you have said, just some work on the transistors to enable higher clock.
?? You've now answered your own question regarding the modification that you're asking about. It is a GF104 with all features enabled and a clock bump, which was enabled by changes to the individual transistor architectures. That is what I said. That is now what you've agreed to.

Now that we're in agreement, we can move on with this thread. The 560Ti is exactly a 460 that is fully enabled, plus a clock bump. Thus, any hypothetical extrapolated value for a 'fully featured 460 can be directly measured from the 560Ti on a clock-for-clock basis against the GK104.

Tada! Done :)
 
From 3DCenter (translated): "The first specifications for the GeForce GTX 670 Ti."

From what I can gather from the translation, there is apparently a "reliable source" involved for at least some of the reported specs but many other portions of the (not short) spec list seems to be speculation or other uncertain stuff.
 
Newegg now has the EVGA superclocked model in stock for $535.

<edit> if you missed it, it's now gone. It was there for 2 minutes tho.
 
Compute performance comes from ALUs and unless you'd have an insane for the time being 1:1 DP/SP ratio in order to have N GFLOPs DP, you'd need automaticall (assuming a 2:1 ratio) N*2 GFLOPs SP. TMUs are also relatively bound to clusters since Fermi.
Not unless your ALUs are severely borked to begin with. Memory subsystem plays a far bigger role in compute. Nvidia gimped it significantly in GK104.
 
I think you're underestimating the significance of GK104's static scheduling and dual-issue Ail. Both conspire to reduce throughput and if the big boy sheds them it could mean better compute performance without a big bump in theoreticals.
WTH are you talking about? Evidence?
 
Ten percent is quite a bit imo for what they did. They could have just enabled the shaders in the GF104 , but they decided to do extra work to improve the chip.
Oh I agree 9% die size difference for "the same chip" is indeed quite a lot.
That said, are you sure that large difference really exists? I couldn't really find any trustworthy measurements for both. Most sites claiming same size, some even GF114 smaller, some GF104 smaller, with the numbers tossed around usually 332mm², 358mm², 360mm², 365mm²... Some numbers being official figures (the 360mm² for GTX560) some measured. Haven't seen anyone measuring both of them actually.
 
WTH are you talking about? Evidence?
I can't think of more compelling evidence than this:
What the CPU now does is to schedule instructions for all 1536 execution units, resolve data dependencies, and in general, make sure everything is fed right. This is a lot more complex than a shader compile, it is multi-unit scheduling where multi means >1000.

-Charlie
:p
 
Last edited by a moderator:
trinibwoy said:
Has everyone gone illiterate all of a sudden? Read the damn reviews.
I've read many of them, but I haven't seen strong reasons that support your point. I get that the compiler will have to do some more work, but that doesn't mean it will impact execution performance.
 
a) Reviews are done by people who don't often understand much about the architecture.

b) The compiler argument is bogus, afaict.

I'm sorry but "reviewers are dumb" isn't a valid counterpoint to measured performance. Neither is "afaict". That's no different from the "if and but" excuses people made for VLIW.

I've read many of them, but I haven't seen strong reasons that support your point. I get that the compiler will have to do some more work, but that doesn't mean it will impact execution performance.

The 680 has 2x the theoretical flops of the 580. Where's the evidence of dual-issue and the compiler not impacting execution performance?

http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/17
 
trinibwoy said:
I'm sorry but "reviewers are dumb" isn't a valid counterpoint to measured performance. Neither is "afaict". That's no different from the "if and but" excuses people made for VLIW.

The 680 has 2x the theoretical flops of the 580. Where's the evidence of dual-issue and the compiler not impacting execution performance?

http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/17
Where is the evidence that it is? If there's one constant in getting DSPs to unleash their full potential, it's in getting to the operands fed to the ALU. Why would GPUs be any different?
 
The 680 has 2x the theoretical flops of the 580. Where's the evidence of dual-issue and the compiler not impacting execution performance?

http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/17

I'm not sure I understand. The 680 isn't two 580s glued together. As such, its performance will vary based on the application, and you should not expect exactly 2x the performance except on artificial series of independent multiply-adds.

For example, AES uses a lot of bit shifts. But bit shifts don't have a throughput of 1/clk/core on the 680. You will find some other shaders that speed up by more than 2x.

This is independent of issue width or amount of SW scheduling. Teasing out the effect of compiler scheduling from shader performance would require running the same shader on two SMs that are identical in every way except in the amount of compiler scheduling required.
 
I'm sorry but "reviewers are dumb" isn't a valid counterpoint to measured performance. Neither is "afaict".
If you think reviewers are bang on target with all the compiler sucks whining, let's discuss compilers.

a) What kind of instruction selection/scheduling optimizations can a compiler do for an in order RISC core? Just how hard do you think those are? Just what is the limit of the perf upside there?

b) Let us forget that the compute subset of the ISA, which would be affected by the compiler's optimizations, is almost certainly the same as Fermi.

That's no different from the "if and but" excuses people made for VLIW.
There's no excuse here. VLIW4/5 was (and would be for any other compiler) very very hard for a compiler to generate good code for, given all the register file port/in pipe register limitations, even with the vec4 friendliness of typical shader code. The atrocious 40 cycle latency to handle if's didn't help.

The 680 has 2x the theoretical flops of the 580. Where's the evidence of dual-issue and the compiler not impacting execution performance?
Look at 680's latency hiding capability. Look at it's cache susbsytem vs the changes to ALU organization. Dual issue not going right is pretty low on 680's trouble list, if it's there at all.
 
Oh I agree 9% die size difference for "the same chip" is indeed quite a lot.
That said, are you sure that large difference really exists? I couldn't really find any trustworthy measurements for both. Most sites claiming same size, some even GF114 smaller, some GF104 smaller, with the numbers tossed around usually 332mm², 358mm², 360mm², 365mm²... Some numbers being official figures (the 360mm² for GTX560) some measured. Haven't seen anyone measuring both of them actually.

Actually I've only seen a table in wikipedia, which I assumed should be correct by now, but it could be wrong.
I didn't find anything useful now either.

Edit @Albuquerque I never asked a question. Your wording implied that gf104 > gf114 is just like gtx 570 > 580 (units enabled and higher clock) I was merely saying it was a bit more than that.
 
Last edited by a moderator:
Back
Top