NVIDIA Kepler speculation thread

fillrate tests interesting..

overall roundup, difference shrinks on higher res (not sure it carries same disease from Fermi or just lack of bandwidth), though, it still manages to best..

total19xfi.png

That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.

I am going to reserve judgement until I see Anand's review.
 
(not sure it carries same disease from Fermi or just lack of bandwidth), though, it still manages to best..

More like bandwidth + the radeon frontend/driver low res disease.

That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.
I am going to reserve judgement until I see Anand's review.

Huh? both stock 925 and 1000mhz results (interesting for clock2clock performance) in most graphs, and otherwise just the stock vs stock.. But with old 7970 launch drivers.
 
Last edited by a moderator:
Interesting how it stacks up against GF114. Almost twice as fast, awesome scaling.

Scaling is only a bit better than GF110 and it's only 25-33% faster than GF110 at the 25x16 and lower benches published so far...
 
That test looks like it was done against a factory OC'd 7970 as well, the stock vs stock comparison will be even more one sided.

Average results GTX680@stock vs. HD7970@1000/5600:

1AA + 16AF
1680x1050: GTX 680 7,3% faster
1920x1200: GTX 680 5,1% faster
2560x1600: GTX 680 0,4% slower

4AA + 16AF
1680x1050: GTX 680 5,8% faster
1920x1200: GTX 680 3,1% faster
2560x1600: GTX 680 1,3% slower

8AA + 16AF
1680x1050: GTX 680 5,6% faster
1920x1200: GTX 680 3,1% faster
2560x1600: GTX 680 2,8% slower
 
Can't wait for the GTX680@maxOC vs. HD7970@maxOC face-off promised by HardOCP.

Solid Perf/W comparison between GTX680 and HD 7870 would be very interesting, too.
 
Some interesting bits on the warp scheduling in Kepler, from the white paper:

To feed the execution resources of SMX, each unit contains four warp schedulers, and each warp scheduler is capable of dispatching two instructions per warp every clock.
More importantly, the scheduling functions have been redesigned with a focus on power efficiency. For example: Both Kepler and Fermi schedulers contain similar hardware units to handle scheduling functions, including, (a) register scoreboarding for long latency operations (texture and load), (b) inter-warp scheduling decisions (e.g., pick the best warp to go next among eligible candidates), and (c) thread block level scheduling (e.g., the GigaThread engine); however, Fermi’s scheduler also contains a complex hardware stage to prevent data hazards in the math datapath itself. A multi-port register scoreboard keeps track of any registers that are not yet ready with valid data, and a dependency checker block analyzes register usage across a multitude of fully decoded warp instructions against the scoreboard, to determine which are eligible to issue.
For Kepler, we realized that since this information is deterministic (the math pipeline latencies are not variable), it is possible for the compiler to determine up front when instructions will be ready to issue, and provide this information in the instruction itself. This allowed us to replace several complex and power-expensive blocks with a simple hardware block that extracts the pre-determined latency information and uses it to mask out warps from eligibility at the inter-warp scheduler stage.
We also developed a new design for the processor execution core, again with a focus on best performance per watt. Each processing unit was scrubbed to maximize clock gating efficiency and minimize wiring and retiming overheads.
 
Because the superscalar design could not be fully utilized with scalar/1D instructions. Its the same like on the non-GF110/100 Fermis.
 
Probably because tweaktown hasn't got a handpicked review sample, but just some ordinary retail card :D

It does draw a :?: as to why tweaktown's benchmarks and power consumption numbers are different from the other reviews. If you read the comments he does say that they didn't receive a card from them and he's not under NDA. Tweaktown's review shows higher power consumption for the GTX 680. So will the retail card's performance and power consumption actually be different then what's reviewed?
 
Looking at power consumption graph from pcinlife tests it looks like power band in which GK104 operates is narrower than Thaiti's, but pure average is very similar. I would like to see more games tested in that mix, especially games where AMD is doing well, like Metro 2033.

Ideally to get the best picture possible how efficiently each GPU renders frames I would like to see power graphs for games frame limited to 30 or 60FPS with 0xAA to 8xAA and various levels of AF. That would normalize power use per frame with driver overhead included. Hopefully some reviewers will take note and dive into this dark alley of extra test hours spent to make their review truly best!
 
Back
Top