NVIDIA Kepler speculation thread

sebbbi · Nov 22, 2012

I don't understand nvidia's statatement either... "UAV in non-pixel-shader stages" is a feature that would be mainly used in games

Unfortunately it seems that DX11.1 TIR (target independent rasterization) is pretty limited. Would be fun to use it for some resolution independent deferred rendering.

MDolenc · Nov 22, 2012

Yeah... And from that table Kepler sound exactly like Fermi.
Has anyone with Win 8 and 6x0 actually tried what caps are returned?

Gipsel · Nov 22, 2012

Ethatron said:
nVidias problem is likely the 64 UAVs

They should be flexible with that as their memory addressing scheme should basically allow an unlimited number of buffers. That bindless texture stuff builds on that for instance. I don't know where the type of the resource would make much of a difference.

Kaotik · Nov 23, 2012

A1xLLcqAgt0qc2RyMz0y said:
How many PC games now or in the near future are going to use (or even would have used) these four non-gaming 11_1 features? :

Target-Independent Rasterization (2D rendering only)

16xMSAA Rasterization (2D rendering only)

Orthogonal Line Rendering Mode

UAV in non-pixel-shader stages

I bet the answer is NOT a Single One.

The problem is that even if it supports all the others, it can't use them via DX API if I understood it right because you don't check compatibility feature by feature, but by feature level by feature level, which for Kepler is 11_0, not 11_1

Ryan Smith · Nov 23, 2012

Kaotik said:
The problem is that even if it supports all the others, it can't use them via DX API if I understood it right because you don't check compatibility feature by feature, but by feature level by feature level, which for Kepler is 11_0, not 11_1

Unfortunately cap bits are kind of back in DX11.1. Though they're primarily for dealing with SoC GPUs.

http://msdn.microsoft.com/en-us/library/ff476497(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/ff476124(v=vs.85).aspx#D3D11_FEATURE_D3D11_OPTIONS

Psycho · Nov 23, 2012

A1xLLcqAgt0qc2RyMz0y said:
How many PC games now or in the near future are going to use (or even would have used) these four non-gaming 11_1 features? :

First: Not many yet, as it's only GCN that supports 11.1 now, meaning target audience for such optimizations are somewhat limited.
Second, because of these missing features, nvidia can't provide featurelevel 11_1, meaning that you loose the rest too.

And yes, UAVs outside PS/CS is definitely a gaming oriented feature.

Doesn't look like D3D11_FEATURE_DATA_D3D11_OPTIONS really support this kepler-defined seperation of the features: http://msdn.microsoft.com/en-us/library/hh404457(v=vs.85).aspx - it's more for lesser devices.

Ethatron · Nov 23, 2012

Gipsel said:
They should be flexible with that as their memory addressing scheme should basically allow an unlimited number of buffers. That bindless texture stuff builds on that for instance. I don't know where the type of the resource would make much of a difference.

It's hard to say [what their problem is] without knowing the silicon, maybe it's just political. I haven't thought of the memory in the 64 UAV-case, more say: problems with the ISA (not able to put 6 bit in the bitfield); or as UAVs have a direct access path, they don't have enough special ports from the shaders to the memory controller; something in that direction.
As I'm not so up o date on nVidia hw internals (from CUDA or OCL docs), it was just a wild card.

OpenGL guy · Nov 23, 2012

Ethatron said:
It's hard to say [what their problem is] without knowing the silicon, maybe it's just political. I haven't thought of the memory in the 64 UAV-case, more say: problems with the ISA (not able to put 6 bit in the bitfield); or as UAVs have a direct access path, they don't have enough special ports from the shaders to the memory controller; something in that direction.
As I'm not so up o date on nVidia hw internals (from CUDA or OCL docs), it was just a wild card.

I seriously doubt they would index UAVs this way given their support for bindless textures. Also, from what I have seen in their PTX code, they have a flat address space so I would expect they would just pass in offsets(pointers) to the 64 UAVs.

jlippo · Nov 24, 2012

MDolenc said:
Yeah... And from that table Kepler sound exactly like Fermi.

This is in line with every GPU release from nvidia thus far.
First design GPU then create refresh which has same programmability, next generation will be the one that actually changes things. (Now it just seems that they also do refresh for an refresh as well.)

I was surprised that Kepler changed things as much as it did, especially the K20.
Maxwell should be the one that brings home the bacon when it comes to new things. (SM6?)

AnarchX · Nov 25, 2012

GeForce GT 730M?
Maybe GK208 with DX feature level 11_1? But it could be also a GK107 solution.

AMD prepares HD 8000M series, too.

tviceman · Nov 25, 2012

AnarchX said:
GeForce GT 730M?
Maybe GK208 with DX feature level 11_1? But it could be also a GK107 solution.

AMD prepares HD 8000M series, too.

And the rebadging strikes early! You know what would be really funny is if the 730m is fermi based. Jokes aside, it will be interesting to see how well Nvidia's refreshed parts perform with respect to their existing parts out now, since the GTX680, GTX660, and GTX650 do not have any spare cores or memory controllers like they did with the first gen Fermi stuff. It'll likely be way more modest, but hopefully they can squeeze out another 10% performance within the same TDP.

iMacmatician · Nov 25, 2012

A few more (assuming there are no typos)

GT 720M and GT 710M.

The notebook with the 710M shows up elsewhere with a GT 630M, so either the 630M = 710M or there's a typo.

MDolenc · Dec 8, 2012

OpenGL guy said:
I seriously doubt they would index UAVs this way given their support for bindless textures. Also, from what I have seen in their PTX code, they have a flat address space so I would expect they would just pass in offsets(pointers) to the 64 UAVs.

I checked the OpenGL side a bit and it says 16 shader storage blocks per stage for a total of 96 shader storage blocks. Thats for both Fermi and Kepler. Of course there's nothing to say if these are hard or soft limits.

On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??

KimB · Dec 8, 2012

MDolenc said:
I checked the OpenGL side a bit and it says 16 shader storage blocks per stage for a total of 96 shader storage blocks. Thats for both Fermi and Kepler. Of course there's nothing to say if these are hard or soft limits.

On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??

Maybe this?
http://msdn.microsoft.com/en-us/library/windows/desktop/ff476524(v=vs.85).aspx

Gipsel · Dec 9, 2012

MDolenc said:
On interesting question though... Here are the interfaces that we have: ID3D11DeviceContext and ID3D11DeviceContext1. How exectly does one set an UAV for vertex shader??

UAVs are set for the whole DX pipeline. UAVs bound to it (ID3D11DeviceContext::OMSetRenderTargetsAndUnorderedAccessViews) can be accessed from all stages (UAVs in CS are bound differently though). That means you can write to an UAV in the vertex shader and access the same buffer in a pixel shader if you want.

MDolenc · Dec 9, 2012

Ah... I see, thanks!

iMacmatician · Jan 6, 2013

From PCEVA "[N card the first day of 2013, PCEVA global first exposure '000 desserts graphics GTX660SE test" (original).

Main points:

GTX 660 SE, a new GK106 derivative
768 CCs at 928/1006 MHz
2 GB GDDR5 at 5.6 Gbps on a 192-bit memory bus
The linked thread contains benchmarks

3DCenter, reporting on the previous report, mentions a launch later this month and gives an estimated performance index right in the middle of those of the 650 Ti and the 660, and a hair under the 7850's.

iMacmatician · Jan 9, 2013

From AnandTech: "NVIDIA’s Annual GPU Rebadge Begins: GeForce GT 730M and GeForce 710M Partial Specs Published."

GT 730M ≈ GT 640M?
710M ≈ GT 620M?

Clocks aren't revealed as of now, but there is something called the "GeForce Performance Score," the performance compared to the Intel HD 4000.

Malo · Jan 10, 2013

And isn't the 640M and 620M Fermi-based? I thought only the 650+ was Kepler.

iMacmatician · Jan 10, 2013

The 640M is Kepler, but the 640M LE has a GF108 variant and a GK107 variant (under that same name).

NVIDIA Kepler speculation thread

sebbbi

MDolenc

Gipsel

Kaotik

Drunk Member

Ryan Smith

Psycho

Ethatron

OpenGL guy

jlippo

AnarchX

tviceman

iMacmatician

MDolenc

KimB

Gipsel

MDolenc

iMacmatician

iMacmatician

Malo

Yak Mechanicum

iMacmatician

Similar threads