NVIDIA Kepler speculation thread

DSC · Mar 22, 2012

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680

GTX 680 page now up at geforce.com. Many other pages about the GTX 680 and Kepler architecture still not up yet.

NathansFortune · Mar 22, 2012

Man from Atlantis said:
fillrate tests interesting..

overall roundup, difference shrinks on higher res (not sure it carries same disease from Fermi or just lack of bandwidth), though, it still manages to best..

That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.

I am going to reserve judgement until I see Anand's review.

Psycho · Mar 22, 2012

Man from Atlantis said:
(not sure it carries same disease from Fermi or just lack of bandwidth), though, it still manages to best..

More like bandwidth + the radeon frontend/driver low res disease.

NathansFortune said:
That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.
I am going to reserve judgement until I see Anand's review.

Huh? both stock 925 and 1000mhz results (interesting for clock2clock performance) in most graphs, and otherwise just the stock vs stock.. But with old 7970 launch drivers.

Mize · Mar 22, 2012

Arty said:
Interesting how it stacks up against GF114. Almost twice as fast, awesome scaling.

Scaling is only a bit better than GF110 and it's only 25-33% faster than GF110 at the 25x16 and lower benches published so far...

Love_In_Rio · Mar 22, 2012

fellix said:
Wow! Look at those fill-rates. Kepler kept the full-rate FP16 filtering, now at 128 texels per clock!

Why hasn´t AMD improved that after all these years?.

jimbo75 · Mar 22, 2012

http://www.pcinlife.com/article_photo/gtx_680/results/total.png

Whole game list - as predicted the 7970 wins in AvP, Crysis Warhead and Metro. The 680 appears to have blown it away in Shogun II though, which is something of a reversal of fortunes in that game. The Batman scores look too high in favour of the 680 so I'm guessing that's favourable settings.

Mianca · Mar 22, 2012

NathansFortune said:
That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.

Average results GTX680@stock vs. HD7970@1000/5600:

1AA + 16AF
1680x1050: GTX 680 7,3% faster
1920x1200: GTX 680 5,1% faster
2560x1600: GTX 680 0,4% slower

4AA + 16AF
1680x1050: GTX 680 5,8% faster
1920x1200: GTX 680 3,1% faster
2560x1600: GTX 680 1,3% slower

8AA + 16AF
1680x1050: GTX 680 5,6% faster
1920x1200: GTX 680 3,1% faster
2560x1600: GTX 680 2,8% slower

NathansFortune · Mar 22, 2012

Psycho said:
Huh? both stock 925 and 1000mhz results (interesting for clock2clock performance) in most graphs, and otherwise just the stock vs stock.. But with old 7970 launch drivers.

Yeah, reading comprehension fail. D:

Mianca · Mar 22, 2012

Can't wait for the GTX680@maxOC vs. HD7970@maxOC face-off promised by HardOCP.

Solid Perf/W comparison between GTX680 and HD 7870 would be very interesting, too.

Man from Atlantis · Mar 22, 2012

Psycho said:
More like bandwidth + the radeon frontend/driver lores disease.

looking against to GTX580 scores it seems that it is bw, sure radeon frontend seems suffer at low res

fellix · Mar 22, 2012

Some interesting bits on the warp scheduling in Kepler, from the white paper:

To feed the execution resources of SMX, each unit contains four warp schedulers, and each warp scheduler is capable of dispatching two instructions per warp every clock.
More importantly, the scheduling functions have been redesigned with a focus on power efficiency. For example: Both Kepler and Fermi schedulers contain similar hardware units to handle scheduling functions, including, (a) register scoreboarding for long latency operations (texture and load), (b) inter-warp scheduling decisions (e.g., pick the best warp to go next among eligible candidates), and (c) thread block level scheduling (e.g., the GigaThread engine); however, Fermi’s scheduler also contains a complex hardware stage to prevent data hazards in the math datapath itself. A multi-port register scoreboard keeps track of any registers that are not yet ready with valid data, and a dependency checker block analyzes register usage across a multitude of fully decoded warp instructions against the scoreboard, to determine which are eligible to issue.
For Kepler, we realized that since this information is deterministic (the math pipeline latencies are not variable), it is possible for the compiler to determine up front when instructions will be ready to issue, and provide this information in the instruction itself. This allowed us to replace several complex and power-expensive blocks with a simple hardware block that extracts the pre-determined latency information and uses it to mask out warps from eligibility at the inter-warp scheduler stage.
We also developed a new design for the processor execution core, again with a focus on best performance per watt. Each processing unit was scrubbed to maximize clock gating efficiency and minimize wiring and retiming overheads.

CarstenS · Mar 22, 2012

Wow, did that one leak as well?

Arty · Mar 22, 2012

CarstenS said:
Wow, did that one leak as well?

Whitepaper: http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf

Picao84 · Mar 22, 2012

CarstenS said:
Wow, did that one leak as well?

Its already available at geforce.com

Man from Atlantis · Mar 22, 2012

GK104 acts as 1024CCs on some tests ~30% faster than GF110..

AnarchX · Mar 22, 2012

Because the superscalar design could not be fully utilized with scalar/1D instructions. Its the same like on the non-GF110/100 Fermis.

Man from Atlantis · Mar 22, 2012

another theoretical benchmarks, waiting for H.fr's

http://www.ixbt.com/video3/gk104-part1.shtml

Psycho · Mar 22, 2012

DSC said:
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-680
GTX 680 page now up at geforce.com. Many other pages about the GTX 680 and Kepler architecture still not up yet.

Strange that the big sites refrain from jumping the gun, now nvidia does it itself, kind of voiding the nda..

ECH · Mar 22, 2012

Psycho said:
Probably because tweaktown hasn't got a handpicked review sample, but just some ordinary retail card

It does draw a :?:

as to why tweaktown's benchmarks and power consumption numbers are different from the other reviews. If you read the comments he does say that they didn't receive a card from them and he's not under NDA. Tweaktown's review shows higher power consumption for the GTX 680. So will the retail card's performance and power consumption actually be different then what's reviewed?

Lightman · Mar 22, 2012

Looking at power consumption graph from pcinlife tests it looks like power band in which GK104 operates is narrower than Thaiti's, but pure average is very similar. I would like to see more games tested in that mix, especially games where AMD is doing well, like Metro 2033.

Ideally to get the best picture possible how efficiently each GPU renders frames I would like to see power graphs for games frame limited to 30 or 60FPS with 0xAA to 8xAA and various levels of AF. That would normalize power use per frame with driver overhead included. Hopefully some reviewers will take note and dive into this dark alley of extra test hours spent to make their review truly best!

NVIDIA Kepler speculation thread

DSC

NathansFortune

Psycho

Mize

3dfx Fan

Love_In_Rio

jimbo75

Mianca

NathansFortune

Mianca

Man from Atlantis

idk

fellix

CarstenS

Moderator

Arty

KEPLER

Picao84

Man from Atlantis

idk

AnarchX

Man from Atlantis

idk

Psycho

ECH

Lightman

Similar threads