NVIDIA Kepler speculation thread

psurge · Mar 23, 2012

Is B3D going to do an architecture "deep dive" for either GCN or Kepler?

Ryan Smith · Mar 23, 2012

moozoo said:
From http://techreport.com/articles.x/22653
"In the SMX, there are four 16-ALU-wide vector execution units and four 32-wide units. Each of the four schedulers in the diagram above is associated with one vec16 unit and one vec32 unit."

Rather than some secret block of 8 FP64 CUDA cores that does not shown on any diagrams, isn't it more likely that one of the vec16 units per SMX can do FP64 at half rate. i.e. one out of the 12 vertical columns of CUDA cores does 1/2 rate FP64.

For the GK110, my guess is that each schedulers has two vec16 units (which improves the ratio of registers to cores) and all cores are capable of 1/2 rate FP64. This is in roughly the same size as the GK104 SMX's.
Then to make up the missing cores have 6 SMX's instead of 4.

I haven't heard what Scott has heard, but then again it looks like today is just one of those days. As far as FP64 is concerned, this is exactly what NVIDIA told me: "In GTX 680 the FP64 execution unit is separate from the CUDA cores (like LD/ST and SFU), with 8 FP64 units per SMX".

Make of that what you will

Rangers · Mar 23, 2012

Kepler is losing those 99th percentile frame tests at tech report, that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.

The GeForce GTX 680 is slightly faster and 50 bucks less expensive than the Radeon HD 7970, so it lands in a better position on this first plot. However, if we switch to an arguably superior method of understanding gaming performance and smoothness, our 99th percentile frame time (converted to FPS so the plot reads the same), the results change a bit.

The GTX 680's few instances of higher frame latencies, such as that apparent GPU Boost issue in Arkham City, move it just a couple of ticks below the Radeon HD 7970 in overall performance. Then again, the GTX 680 costs $50 less, so it's still a comparable value.

Mize · Mar 23, 2012

Dude, you have a serious persecution complex. I think you're basically the only one here with a tribal bias. The rest of us just like new tech.

jimbo75 · Mar 23, 2012

Mize said:
Dude, you have a serious persecution complex. I think you're basically the only one here with a tribal bias. The rest of us just like new tech.

Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?

silent_guy · Mar 23, 2012

jimbo75 said:
Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?

This is why:

Rangers said:
..., that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.

Nothing wrong with the information, but the me-against-the-rest-of-the-world asides get so tired that they eventually drown the message, even if it's actually a decent observation for once.

My guess: more complex compiler disrupts the continuous flow of operation?

denev2004 · Mar 23, 2012

Rangers said:
Kepler is losing those 99th percentile frame tests at tech report, that people seemed to like back when Nvidia was winning. Haven't heard anything about it this time though.

Wouldn't it be the drivers problem? It seems GTX680 behave extremely bad on one game about the 99th percentile frame tests

denev2004 · Mar 23, 2012

psurge said:
Is B3D going to do an architecture "deep dive" for either GCN or Kepler?

I hope they will. Kinda looking forward to that.

Mize · Mar 23, 2012

jimbo75 said:
Why is quoting outlier reviews a sign of a persecution complex - especially when they bring new evidence to the table?

Exactly what silent_guy pointed out. It's one thing to say "hey, look at this data, what would cause that" but Rangers has to add his little snark to dirty the conversation. Seriously, strike that and it's a discussion about what would cause that in the hardware or drivers, but, with that, it's a "you're all biased" load of baloney.

I do think it's a terribly interesting topic, BTW (99%tile) and worthy of dissection.

AlphaWolf · Mar 23, 2012

I wonder if it isn't due to the dynamic clocking.

Mize · Mar 23, 2012

AlphaWolf said:
I wonder if it isn't due to the dynamic clocking.

This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations

Tchock · Mar 23, 2012

silent_guy said:
This is why:
My guess: more complex compiler disrupts the continuous flow of operation?

Curious about this too; especially since the AMD cards leapfrogged in terms of 99th percentile performance after transitioning to GCN with reduced ALU power.

Also- anything (*anything*) about a GK106/116? I suspect the Q4 refresh (700s) will include that, the 110, and maybe a revision of 104 for clocks.

AlphaWolf · Mar 23, 2012

Mize said:
This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations

Techpowerup reported clocks dropping below 1Ghz at times and agrees it would seem.

Given a selected clock offset of +50 MHz, we would expect clocks between 1056 MHz (base clock + 50 MHz) and 1150 MHz (highest dynamic clock + 50 MHz). While the majority of clocks are bunched up in that region indeed, we do see a good amount of clocks below 1056 MHz, all the way down to the default base clock of 1006 MHz and even below.
These unexpected clocks can be explained by dynamic overclocking reducing clock speeds because a certain game scene causes it to run into the TDP power limit, or similar situations. Increased temperature from overclocking alone can not account for the difference, as it can only reduce clocks by 40 MHz, which would still give us a lowest clock of 1016 MHz.

rpg.314 · Mar 23, 2012

silent_guy said:
My guess: more complex compiler disrupts the continuous flow of operation?

No idea what you mean there.

Albuquerque · Mar 23, 2012

Mize said:
This is my bet. BE interesting to see if dynamic clocking has any microstutter effects with vsync (beat frequencies) or if it has the smarts to be used to reduce it in SLI situations

Ooooooh now I like where this could be going, if they were able to implement such a thing.

I find the $ / 99th %tile graph to be useful, but I think you might be able to gain even more insight by one more math step: divide dollars by the 99th %tile avg FPS scale to give you a "Actual Games Perf / Dollar":

Code:

Card        Price       FPS         $/Perf (lower = better)
7970        $549        34          16.15
680         $499        32          15.59
580         $499        28          17.82
7870        $349        27          12.93
560Ti448    $269        23          11.70

7870 looks quite good here, but so does the 560Ti448 (as expected.)

A1xLLcqAgt0qc2RyMz0y · Mar 23, 2012

Mize said:
I'm actually underwhelmed with both the 7970 and 680. I really would love to see a single GPU pushing out 580SLI or 6970CFX numbers so I could switch to a single card solution without having to give up my current IQ running 3x 19x12 monitors. Based on the benchmarks it doesn't look like 2012 is my year for a single card solution

BigK GK110 comes by August 2012 (or earlier) so keep the hope.

rpg.314 · Mar 23, 2012

trinibwoy said:
Lol I'm glad somebody said it. The coo-coo train is really rolling now. The fact that it doesn't make toast is probably cheating too.

Jawed was right about the greater compiler dependency though. He probably saw the white paper before starting that little diatribe In any case it's obvious nVidia's static scheduling needs some work. It's only dual issue dammit, how hard can that be. AMD had to deal with 2.5x that.

It's not dual issue. They have had dual issue since gf104 and dual issue isn't hard to get right with a decent ISA, which they should have.

GCN and gk104 have the same reg file size, virtually same L1/LDS, almost same clocks, virtually same mem clocks but gk104 has 3x more compute. One of these two designs is unbalanced wrt latency hiding.

willardjuice · Mar 23, 2012

denev2004 said:
I hope they will. Kinda looking forward to that.

Yeah we will, probably shortly after the release of next gen gpu's.

silent_guy · Mar 23, 2012

rpg.314 said:
No idea what you mean there.

I mean: the GPU stalls because it needs to wait longer for the compiler.

rpg.314 · Mar 23, 2012

silent_guy said:
I mean: the GPU stalls because it needs to wait longer for the compiler.

Can't be that. The code is compiled before it's DMAed over the PCIe bus. If the compiler was an issue (which I think it isn't), you'd see longer level load times, that's all.

NVIDIA Kepler speculation thread

psurge

Ryan Smith

Rangers

Mize

3dfx Fan

jimbo75

silent_guy

denev2004

denev2004

Mize

3dfx Fan

AlphaWolf

Specious Misanthrope

Mize

3dfx Fan

Tchock

AlphaWolf

Specious Misanthrope

rpg.314

Albuquerque

Red-headed step child

A1xLLcqAgt0qc2RyMz0y

rpg.314

willardjuice

super willyjuice

silent_guy

rpg.314

Similar threads