NVIDIA Kepler speculation thread

I really like the techreport reviews, always go there first. I just wish they would look at more games. It's enough to get a trend, but if you go into gory perf detail like them, 5 games is too small a sample to make broad conclusions.

There seem to be 2 different issues at play: one huge hitch in Arkham and some warmup slowdown in Crysis 2.

(If anyone of TR is reading this: please normalize the horizontal axis? Right now, it's impossible to correlate spikes between different GPUs.)
 
rpg.314 said:
Can't be that. The code is compiled before it's DMAed over the PCIe bus. If the compiler was an issue (which I think it isn't), you'd see longer level load times, that's all.
Huh? Not saying it's the compiler, but if you have on-demand compilation of shaders as you move along in a level, this is the kind of stuff you could see.
That being said: a compiler issue doesn't explain why you spikes on GTX5xx and not on GTX680 for a number of tests, so it maybe memory management is a more likely cause?
 
Huh? Not saying it's the compiler, but if you have on-demand compilation of shaders as you move along in a level, this is the kind of stuff you could see.
Nobody does that. Real time rendering is hard enough without trying to do real time compilation.

That being said: a compiler issue doesn't explain why you spikes on GTX5xx and not on GTX680 for a number of tests, so it maybe memory management is a more likely cause?

For that, I'd say 680 is better than 5xx.
 
I really like the techreport reviews, always go there first. I just wish they would look at more games. It's enough to get a trend, but if you go into gory perf detail like them, 5 games is too small a sample to make broad conclusions.

There seem to be 2 different issues at play: one huge hitch in Arkham and some warmup slowdown in Crysis 2.

(If anyone of TR is reading this: please normalize the horizontal axis? Right now, it's impossible to correlate spikes between different GPUs.)

TR is doing a great thing with it's 99%ile measurements, but they need to stop putting out times and start publishing 99%ile frame rates for all the reviews, not just in conclusion.
 
As far as FP64 is concerned, this is exactly what NVIDIA told me: "In GTX 680 the FP64 execution unit is separate from the CUDA cores (like LD/ST and SFU), with 8 FP64 units per SMX".

Make of that what you will

Can DP co-issue with CUDA SP cores?

I can certainly make quite the list for why this seems like a bad idea.
I wonder what the good reason was....
 
If you are running code that is sensitive to DP performance on a Kepler, then you are doing something wrong.

DP is to Kepler as x87 is to Sandy Bridge.

DK
 
rpg.314 said:
Nobody does that. Real time rendering is hard enough without trying to do real time compilation.
That doesn't make sense: how does a compiler compile the shaders if it hasn't seen them?

It's up to the application to decide when to present a particular shader program to the GPU (at least for OpenGL it is, see here for example: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=279913 ) There is no requirement to list all shaders to be used at the start of the context creation.

I'm sure it's more predictable to declare all shaders up front, but in large worlds with many different material shaders, that would be prohibitively costly by itself.
 
If you've got so many shaders that it makes level load times unacceptably long, why not just compile them all at game install time and cache the binaries somewhere?
 
That doesn't make sense: how does a compiler compile the shaders if it hasn't seen them?
When you load a level, compile all the shaders you need, load all the textures you need ...

It's up to the application to decide when to present a particular shader program to the GPU (at least for OpenGL it is, see here for example: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=279913 ) There is no requirement to list all shaders to be used at the start of the context creation.
Read that thread through. That person was using legacy fixed function pipeline and nobody uses that anymore. Changing stuff there will force regeneration/recompilation of shaders.


I'm sure it's more predictable to declare all shaders up front, but in large worlds with many different material shaders, that would be prohibitively costly by itself.
Less costly than shader compilation in frame. Not sure if DX offers the option of offline compilation.
 
rpg.314 said:
When you load a level, compile all the shaders you need, load all the textures you need ...
Well, yes, that's what I suggested myself. But it's not an API requirement and nobody prevents you from doing otherwise.

See also here, http://developer.amd.com/afds/assets/presentations/2902_2_final.pdf, slide 4: Could happen at any time before “dispatch/draw”

Gathering all shaders and textures of a full world may not be practical.

Less costly than shader compilation in frame. Not sure if DX offers the option of offline compilation.
There's the choice to precompile the textual source code into MSFT intermediate format, of course, but I don't know (doubt) if you can precompile the final GPU specific compiled assembly code.
 
Well, yes, that's what I suggested myself. But it's not an API requirement and nobody prevents you from doing otherwise.
It's not required, but it would be dumb not to. Games go to great lengths to avoid touching the driver in frame. Not doing this is just dumb.
See also here, http://developer.amd.com/afds/assets/presentations/2902_2_final.pdf, slide 4: Could happen at any time before “dispatch/draw”
That's for first shader compilation, not recompilation. Worst case, it affects first frame.
Gathering all shaders and textures of a full world may not be practical.
I don't see any other choice.
There's the choice to precompile the textual source code into MSFT intermediate format, of course, but I don't know (doubt) if you can precompile the final GPU specific compiled assembly code.
AFAIK, on DX the driver never sees the shader text, only the IR. May be a DX dev can tell us if there is a way to cache the final GPU specific code.
 
rpg.314 said:
It's not required, but it would be dumb not to. Games go to great lengths to avoid touching the driver in frame. Not doing this is just dumb.
Ok. So we agree there, then.

That's for first shader compilation, not recompilation. Worst case, it affects first frame.
AFAIK, there is no restriction about when it can happen. Again, that doesn't make it wise to do it in the middle of things.

AFAIK, on DX the driver never sees the shader text, only the IR. May be a DX dev can tell us if there is a way to cache the final GPU specific code.
I know for a fact that the OpenGL driver sees the full text, because that's precisely what happens for iPhone code.

DX has the following call to covert from sorce to byte code:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd607324(v=vs.85).aspx
 
17_bat3.png
13_crys2.png



Interesting results in overcloking Radeon 7970 and Geforce GTX 680
 
Anandtech's BF3 results with FXAA have some of the biggest margins of improvement over the 580, and also 7970. Can this be due to the 680's FP16 throughput advantage?

VGA second-hand market is flooded with 7970...
Really? Where are those numbers being pulled from?
I'm sure there are those itching to get the latest, but I'd almost wait for the big daddy to come out before making a modest hop to the side.
 
Really? Where are those numbers being pulled from?
I'm sure there are those itching to get the latest, but I'd almost wait for the big daddy to come out before making a modest hop to the side.
I'm quite sure he was joking; there would be no need for someone to buy a 7970 on the first day it was available, and then in two months sell it at a loss, only to buy a 680 on the first day it was available.

While the 680 is arguably faster than the 7970, it isn't that much faster. If you were an epic NV fanboy, you'd never purchase the AMD card. If you were an epic power-miser, you'd never buy something in the top tier. If you're an epic AMD fanboy, the 680 doesn't bring enough to the table (IMO) to swing your vote.

I like the 7970 because of their proven implementation of Power Tune and how overclocking fits into it. I'm very hesitant about NV's very first incarnation of this Boost Clock business, especially when it comes to the oddities involved in overclocking. That is basically the only reason why I'd lean towards the 7970; pretty much everything else looks better on the 680 front.

I'm pretty sure Boost Clock is gonna get better, maybe even for this generation with just better drivers, but certainly on next generations with more feedback from users and VAR's. I'm just not sure I would want to be an early adopter of this particular tech, based on what little I've seen so far.
 
Back
Top