NVIDIA Kepler speculation thread

I don't understand nvidia's statatement either... "UAV in non-pixel-shader stages" is a feature that would be mainly used in games :)

Unfortunately it seems that DX11.1 TIR (target independent rasterization) is pretty limited. Would be fun to use it for some resolution independent deferred rendering.
 
Yeah... And from that table Kepler sound exactly like Fermi.
Has anyone with Win 8 and 6x0 actually tried what caps are returned?
 
nVidias problem is likely the 64 UAVs
They should be flexible with that as their memory addressing scheme should basically allow an unlimited number of buffers. That bindless texture stuff builds on that for instance. I don't know where the type of the resource would make much of a difference.
 
How many PC games now or in the near future are going to use (or even would have used) these four non-gaming 11_1 features? :
  • Target-Independent Rasterization (2D rendering only)
  • 16xMSAA Rasterization (2D rendering only)
  • Orthogonal Line Rendering Mode
  • UAV in non-pixel-shader stages

I bet the answer is NOT a Single One.

The problem is that even if it supports all the others, it can't use them via DX API if I understood it right because you don't check compatibility feature by feature, but by feature level by feature level, which for Kepler is 11_0, not 11_1
 
The problem is that even if it supports all the others, it can't use them via DX API if I understood it right because you don't check compatibility feature by feature, but by feature level by feature level, which for Kepler is 11_0, not 11_1
Unfortunately cap bits are kind of back in DX11.1. Though they're primarily for dealing with SoC GPUs.

http://msdn.microsoft.com/en-us/library/ff476497(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/ff476124(v=vs.85).aspx#D3D11_FEATURE_D3D11_OPTIONS
 
How many PC games now or in the near future are going to use (or even would have used) these four non-gaming 11_1 features? :

First: Not many yet, as it's only GCN that supports 11.1 now, meaning target audience for such optimizations are somewhat limited.
Second, because of these missing features, nvidia can't provide featurelevel 11_1, meaning that you loose the rest too.

And yes, UAVs outside PS/CS is definitely a gaming oriented feature.

Doesn't look like D3D11_FEATURE_DATA_D3D11_OPTIONS really support this kepler-defined seperation of the features: http://msdn.microsoft.com/en-us/library/hh404457(v=vs.85).aspx - it's more for lesser devices.
 
Last edited by a moderator:
They should be flexible with that as their memory addressing scheme should basically allow an unlimited number of buffers. That bindless texture stuff builds on that for instance. I don't know where the type of the resource would make much of a difference.

It's hard to say [what their problem is] without knowing the silicon, maybe it's just political. I haven't thought of the memory in the 64 UAV-case, more say: problems with the ISA (not able to put 6 bit in the bitfield); or as UAVs have a direct access path, they don't have enough special ports from the shaders to the memory controller; something in that direction.
As I'm not so up o date on nVidia hw internals (from CUDA or OCL docs), it was just a wild card. :)
 
It's hard to say [what their problem is] without knowing the silicon, maybe it's just political. I haven't thought of the memory in the 64 UAV-case, more say: problems with the ISA (not able to put 6 bit in the bitfield); or as UAVs have a direct access path, they don't have enough special ports from the shaders to the memory controller; something in that direction.
As I'm not so up o date on nVidia hw internals (from CUDA or OCL docs), it was just a wild card. :)
I seriously doubt they would index UAVs this way given their support for bindless textures. Also, from what I have seen in their PTX code, they have a flat address space so I would expect they would just pass in offsets(pointers) to the 64 UAVs.
 
Yeah... And from that table Kepler sound exactly like Fermi.
This is in line with every GPU release from nvidia thus far.
First design GPU then create refresh which has same programmability, next generation will be the one that actually changes things. (Now it just seems that they also do refresh for an refresh as well.)

I was surprised that Kepler changed things as much as it did, especially the K20.
Maxwell should be the one that brings home the bacon when it comes to new things. (SM6?)
 
GeForce GT 730M?
Maybe GK208 with DX feature level 11_1? But it could be also a GK107 solution.

AMD prepares HD 8000M series, too.

And the rebadging strikes early! You know what would be really funny is if the 730m is fermi based. Jokes aside, it will be interesting to see how well Nvidia's refreshed parts perform with respect to their existing parts out now, since the GTX680, GTX660, and GTX650 do not have any spare cores or memory controllers like they did with the first gen Fermi stuff. It'll likely be way more modest, but hopefully they can squeeze out another 10% performance within the same TDP.
 
I seriously doubt they would index UAVs this way given their support for bindless textures. Also, from what I have seen in their PTX code, they have a flat address space so I would expect they would just pass in offsets(pointers) to the 64 UAVs.
I checked the OpenGL side a bit and it says 16 shader storage blocks per stage for a total of 96 shader storage blocks. Thats for both Fermi and Kepler. Of course there's nothing to say if these are hard or soft limits.

On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??
 
I checked the OpenGL side a bit and it says 16 shader storage blocks per stage for a total of 96 shader storage blocks. Thats for both Fermi and Kepler. Of course there's nothing to say if these are hard or soft limits.

On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??
Maybe this?
http://msdn.microsoft.com/en-us/library/windows/desktop/ff476524(v=vs.85).aspx
 
On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??
UAVs are set for the whole DX pipeline. UAVs bound to it (ID3D11DeviceContext::OMSetRenderTargetsAndUnorderedAccessViews) can be accessed from all stages (UAVs in CS are bound differently though). That means you can write to an UAV in the vertex shader and access the same buffer in a pixel shader if you want.
 
And isn't the 640M and 620M Fermi-based? I thought only the 650+ was Kepler.
 
Back
Top