NVIDIA Kepler speculation thread

lanek · Mar 16, 2012

DavidGraham said:
NVIDIA card can be overclocked too , GTX 685 could also be made with higher clocks !

lol

jimbo75 · Mar 16, 2012

Silent_Buddha said:
Lower fidelity/IQ would be somewhat OK for the side monitors except that oftentimes people can and do turn their heads to focus on something briefly that their peripheral vision caught, rather than turning their entire ingame viewpoint to it. But I think it could in certain games be a net benefit (racing games for instance).

Lower framerates would just be bad. Across the board bad. With absolutely zero redeeming features, IMO. Even if it's your peripheral vision, having it update once per 2 frames in main vision is going to be distracting.

Now if you move it to any type of strategy game (Civ 5 for instance) you'll immediately notice the effect on your mouse as it'll get less responsive the moment you move from primary monitor to the side monitors.

And then how are they going to deal with 3x2 setups? Or 5x1 setups?

If this truly is something Nvidia are putting in. Hopefully they are smart and making it an option that you opt in on, rather than the default.

In other words, the only good thing that could come of lower framerates on side monitors is to boost benchmark scores. It would be absolutely useless in actual gameplay.

Regards,
SB

That's pretty much exactly as I see it as well. Don't get me wrong - if it is optional and it works then I think it's fair enough for Nvidia to be exploring these options. But yeah...turbo and "lower fps side screens" makes me wonder a lot about how it's gonna make them look better in benchmarks.

Interesting point on the mouse, I guess they'd need the cursor to be decoupled from fps in most games for that to work ok.

DavidGraham · Mar 16, 2012

lanek said:
lol

You should be laughing at YOUR argument , no one takes overclocking as the basis of product competitiveness when it works both ways , that may hold true when one product is seriously being pushed to it's limits that you can't extract more juice out of it . but when both products are using the same technology , your argument is not even half valid .

The same applies to driver optimizations . when both companies are using new architectures , you cant expect one to pull ahead of the other by driver enhancements , since it is going to be applied both ways too .

fellix · Mar 16, 2012

This is my "extrapolation" of the Fermi multiprocessor into a Kepler version. The scheduler is omitted.

Bashing is welcome!

Jawed · Mar 16, 2012

Silent_Buddha said:
Lower framerates would just be bad. Across the board bad. With absolutely zero redeeming features, IMO. Even if it's your peripheral vision, having it update once per 2 frames in main vision is going to be distracting.

This will be like the jello effect you get with shooting video on DSLRs - only worse, because your peripheral vision is more sensitive to frame-rate than your focus.

psurge · Mar 16, 2012

Gipsel said:
To get maximum utilization, the processing of the pixels in a quad has to get out of sync. I really doubt this happening with dynamic scheduling right now.

When trying it statically, it has basically the disadvantages of a combined SIMD8-VLIW8 approach (you described it more as a dual issue VLIW4, but it is not going to work this way) and would try to conceal this by processing of a quad and not a pixel in each (VLIW)-vector lane. For some problems it might work okay, but generally it is not really compatible. Just imagine the hassle when a branch diverges within a quad. How do you compile for that?

And with dynamic scheduling (without VLIW) you would actually need an 8 way scheduler for it.You would need to split each 32 element warp in 4 quarter warps (each quarter warp has one pixel from one of the 8 quads in the warp) and schedule the quarter warps individually. And you don't get any performance improvement compared to my suggestion above, but you need far more complex scheduling hardware.

I'm pretty sure I don't follow what your saying. I was thinking of the scheduler dynamically constructing what is essentially a VLIW 8 instruction by considering up to 2 instructions from a single warp per cycle. A dual issue case would look like:

cycle 1: ALU 1-4 instruction 1 on threads 1-4, ALU 5-6 instruction 2 on threads 1-2
cycle 2: ALU 5-6 instruction 2 on threads 3-4, ALU 1-4: open for an instruction from another warp.

I guess the assignment of ALU to pixel quad is a bit dynamic, but not very. Maybe that's what you mean by quarter warp scheduling... but isn't this already the case, given the limited SFU, LD/ST, and DP resources?

[Not to say that your explanation isn't convincing. I'm just throwing stuff out there and hoping to learn a little from it.]

Psycho · Mar 16, 2012

Rangers said:
FXAA cures more aliasing than MSAA in BF3, MSAA misses most edges altogether on BF3. FXAA blurs the image though, of course.

FXAA (obviously) doesn't cure the under sampling problem (flickering in the distance), which is where aa is most needed in bf3 imho.

Blazkowicz said:
yes Apple is at war against scripting languages except the slowest of all, javascript.they don't want flash because you can make "apps" with it (with access to camera, microphone)

Well.. flash in the form of AIR is allowed for apps - but that's still through the app store ofcourse..

SimBy · Mar 16, 2012

DavidGraham said:
NVIDIA card can be overclocked too , GTX 685 could also be made with higher clocks !

True of course. My premise is that Tahiti is severely underclocked. Even Dave confirmed that. We also know Tahiti scales incredibly well with higher clocks.

Once retail GTX 680 hits the shelves we will see how it does in this department.

Man from Atlantis · Mar 16, 2012

fellix said:
This is my "extrapolation" of the Fermi multiprocessor into a Kepler version. The scheduler is omitted.

Bashing is welcome!

L/S is/was half of that on Fermi though or do you mean to access same LD/ST count through different ports, if yes you could do it same as you do for SFUs.. other than that i'm eager to see people in the known's inputs

fellix · Mar 16, 2012

Honestly, I wasn't sure how/where to place the L/S units, as they are obviously tied to the SIMD blocks... somehow. The problem in the case for Kepler is the odd/even mapping of the unit count and the unknown RF ports -- I just took the easy path with grouping the SIMDs in symmetrical pairs, each on its own port.

iMacmatician · Mar 16, 2012

Silent_Buddha said:
Even if it's your peripheral vision, having it update once per 2 frames in main vision is going to be distracting.
…
And then how are they going to deal with 3x2 setups? Or 5x1 setups?

Well, for the 5x1, the outermost monitors could update once per 4 frames….

*runs*

(Of course, I'm not saying that's a good idea)

Mize · Mar 16, 2012

3 x 1 landscape I can see doing variable updates kinda, but 3x1 portrait is basically a big screen and the side monitors are definitely not peripheral vision...

CarstenS · Mar 16, 2012

Didn't this part of the discussion originate in conjunction with something called adaptive VSync? How could someone try to boost benchmark scores that are obtained Vsynced?

jimbo75 · Mar 17, 2012

CarstenS said:
Didn't this part of the discussion originate in conjunction with something called adaptive VSync? How could someone try to boost benchmark scores that are obtained Vsynced?

What would be the point in improving main-screen fps under VSync anyway? Unless it was below 60 fps of course.

What I wonder is how it could even be measured via fraps or whatever - is it taken as an average over the whole setup or just the middle screen? You can see how the latter would be a serious advantage to Nvidia in benchmarking (if they have control over the benchmark).

lanek · Mar 17, 2012

DavidGraham said:
You should be laughing at YOUR argument , no one takes overclocking as the basis of product competitiveness when it works both ways , that may hold true when one product is seriously being pushed to it's limits that you can't extract more juice out of it . but when both products are using the same technology , your argument is not even half valid .

The same applies to driver optimizations . when both companies are using new architectures , you cant expect one to pull ahead of the other by driver enhancements , since it is going to be applied both ways too .

Was not a criticism about your words, but just i was agree .

Now, if on specific benchmarks, the 680 OC by 100mhz, im not sure the comparaison is still valid ... in games it is more blurry: we hit a grey line on what decide of this OC.. average tdp ( 195W ), max tdp limit... average max admissible tdp limit for benchmark ?

Gipsel · Mar 17, 2012

psurge said:
I'm pretty sure I don't follow what your saying. I was thinking of the scheduler dynamically constructing what is essentially a VLIW 8 instruction by considering up to 2 instructions from a single warp per cycle.

That is an 8 way scheduler (or dual 4 way)

. My last paragraph with the "quarter warps" was describing exactly this.It would basically result in a scheduling with a granularity of just 8 work items.

Gipsel · Mar 17, 2012

fellix said:
This is my "extrapolation" of the Fermi multiprocessor into a Kepler version. The scheduler is omitted.

Bashing is welcome!

In that picture an SMX would have 8 ports, not just 3.

CarstenS · Mar 17, 2012

jimbo75 said:
What would be the point in improving main-screen fps under VSync anyway? Unless it was below 60 fps of course.

What I wonder is how it could even be measured via fraps or whatever - is it taken as an average over the whole setup or just the middle screen? You can see how the latter would be a serious advantage to Nvidia in benchmarking (if they have control over the benchmark).

I cannot say, because I don't know what the source cited meant by it anyway. I just wanted to make a point that in conjunction with VSync it's highly unlikely that this some kind of multi-monitor benchmark fraud.

jimbo75 · Mar 17, 2012

CarstenS said:
I cannot say, because I don't know what the source cited meant by it anyway. I just wanted to make a point that in conjunction with VSync it's highly unlikely that this some kind of multi-monitor benchmark fraud.

Well let's take a game at 40 fps over a SLS, then drop each peripheral screen to 25 fps while boosting the main screen to 60 fps. That's lower fps overall but if Nvidia could convince enough game designers that the main screen was what counts...

They might even have a point, assuming it actually doesn't detract from the gaming experience. Like I said earlier, I had initially thought about lower IQ on the side screens - I can honestly say that in a lot of eyefinity games I wouldn't even notice the difference between max and min settings on the side screens.

rpg.314 · Mar 17, 2012

fellix said:
I wonder why they left the LDS/L1 combo size inchanged?

Fail.

I was REALLY, REALLY hoping they would pile on the caches. SIgh....

NVIDIA Kepler speculation thread

lanek

jimbo75

DavidGraham

fellix

Jawed

psurge

Psycho

SimBy

Man from Atlantis

idk

fellix

iMacmatician

Mize

3dfx Fan

CarstenS

Moderator

jimbo75

lanek

Gipsel

Gipsel

CarstenS

Moderator

jimbo75

rpg.314

Similar threads