NVIDIA Kepler speculation thread

From http://techreport.com/articles.x/22653
"In the SMX, there are four 16-ALU-wide vector execution units and four 32-wide units. Each of the four schedulers in the diagram above is associated with one vec16 unit and one vec32 unit."

Rather than some secret block of 8 FP64 CUDA cores that does not shown on any diagrams, isn't it more likely that one of the vec16 units per SMX can do FP64 at half rate. i.e. one out of the 12 vertical columns of CUDA cores does 1/2 rate FP64.

For the GK110, my guess is that each schedulers has two vec16 units (which improves the ratio of registers to cores) and all cores are capable of 1/2 rate FP64. This is in roughly the same size as the GK104 SMX's.
Then to make up the missing cores have 6 SMX's instead of 4.
I haven't heard what Scott has heard, but then again it looks like today is just one of those days. As far as FP64 is concerned, this is exactly what NVIDIA told me: "In GTX 680 the FP64 execution unit is separate from the CUDA cores (like LD/ST and SFU), with 8 FP64 units per SMX".

Make of that what you will
 
Kepler is losing those 99th percentile frame tests at tech report, that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.

scatter-value-99th.gif


The GeForce GTX 680 is slightly faster and 50 bucks less expensive than the Radeon HD 7970, so it lands in a better position on this first plot. However, if we switch to an arguably superior method of understanding gaming performance and smoothness, our 99th percentile frame time (converted to FPS so the plot reads the same), the results change a bit.

The GTX 680's few instances of higher frame latencies, such as that apparent GPU Boost issue in Arkham City, move it just a couple of ticks below the Radeon HD 7970 in overall performance. Then again, the GTX 680 costs $50 less, so it's still a comparable value.
 
Dude, you have a serious persecution complex. I think you're basically the only one here with a tribal bias. The rest of us just like new tech.
 
Dude, you have a serious persecution complex. I think you're basically the only one here with a tribal bias. The rest of us just like new tech.

Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?
 
jimbo75 said:
Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?
This is why:
Rangers said:
..., that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.
Nothing wrong with the information, but the me-against-the-rest-of-the-world asides get so tired that they eventually drown the message, even if it's actually a decent observation for once.

My guess: more complex compiler disrupts the continuous flow of operation?
 
Kepler is losing those 99th percentile frame tests at tech report, that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.

scatter-value-99th.gif
Wouldn't it be the drivers problem? It seems GTX680 behave extremely bad on one game about the 99th percentile frame tests
 
Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?

Exactly what silent_guy pointed out. It's one thing to say "hey, look at this data, what would cause that" but Rangers has to add his little snark to dirty the conversation. Seriously, strike that and it's a discussion about what would cause that in the hardware or drivers, but, with that, it's a "you're all biased" load of baloney.

I do think it's a terribly interesting topic, BTW (99%tile) and worthy of dissection.
 
I wonder if it isn't due to the dynamic clocking.

This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations
 
This is why:
My guess: more complex compiler disrupts the continuous flow of operation?

Curious about this too; especially since the AMD cards leapfrogged in terms of 99th percentile performance after transitioning to GCN with reduced ALU power.


Also- anything (*anything*) about a GK106/116? I suspect the Q4 refresh (700s) will include that, the 110, and maybe a revision of 104 for clocks.
 
This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations

Techpowerup reported clocks dropping below 1Ghz at times and agrees it would seem.

Given a selected clock offset of +50 MHz, we would expect clocks between 1056 MHz (base clock + 50 MHz) and 1150 MHz (highest dynamic clock + 50 MHz). While the majority of clocks are bunched up in that region indeed, we do see a good amount of clocks below 1056 MHz, all the way down to the default base clock of 1006 MHz and even below.
These unexpected clocks can be explained by dynamic overclocking reducing clock speeds because a certain game scene causes it to run into the TDP power limit, or similar situations. Increased temperature from overclocking alone can not account for the difference, as it can only reduce clocks by 40 MHz, which would still give us a lowest clock of 1016 MHz.
 
This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations

Ooooooh now I like where this could be going, if they were able to implement such a thing.

I find the $ / 99th %tile graph to be useful, but I think you might be able to gain even more insight by one more math step: divide dollars by the 99th %tile avg FPS scale to give you a "Actual Games Perf / Dollar":
Code:
Card        Price       FPS         $/Perf (lower = better)
7970        $549        34          16.15
680         $499        32          15.59
580         $499        28          17.82
7870        $349        27          12.93
560Ti448    $269        23          11.70

7870 looks quite good here, but so does the 560Ti448 (as expected.)
 
Last edited by a moderator:
I'm actually underwhelmed with both the 7970 and 680. I really would love to see a single GPU pushing out 580SLI or 6970CFX numbers so I could switch to a single card solution without having to give up my current IQ running 3x 19x12 monitors. Based on the benchmarks it doesn't look like 2012 is my year for a single card solution :(
BigK GK110 comes by August 2012 (or earlier) so keep the hope.
 
Lol I'm glad somebody said it. The coo-coo train is really rolling now. The fact that it doesn't make toast is probably cheating too.

Jawed was right about the greater compiler dependency though. He probably saw the white paper before starting that little diatribe :LOL: In any case it's obvious nVidia's static scheduling needs some work. It's only dual issue dammit, how hard can that be. AMD had to deal with 2.5x that.

It's not dual issue. They have had dual issue since gf104 and dual issue isn't hard to get right with a decent ISA, which they should have.

GCN and gk104 have the same reg file size, virtually same L1/LDS, almost same clocks, virtually same mem clocks but gk104 has 3x more compute. One of these two designs is unbalanced wrt latency hiding.
 
Last edited by a moderator:
Back
Top