NVIDIA Kepler speculation thread

I'm not sure whether this has been discussed yet in this thread, so forgive me if this question has been answered, but does anyone predict a dual GPU 700 Kepler based card, IE a 790, or will Nvidia plow forward and release the Maxwell chips?
Before the 780 and 770 were released, I thought they would naturally release a dual-GK114/GK204 card as a 790. But due to the absence of a GK114/GK204 and the high TDP of the 770, I doubt there will be such a card. The best reason to have a 690 successor might be to have 4 GB VRAM per GPU (and I see no reason why that can't be done as a minor change to the existing 690). I'd even guess that a dual-GK110 at 375 W would be more likely than a dual-GK104 at 300 W. I'd say the most likely point for a dual-GPU in the next few years will be with 20 nm Maxwell.

And that's before considering
the communication of Nvidia, like the 760-770 will be the 2 last of the 700 series
 
Grr! That looks exactly like my secret tech :).
Secret? http://www.idav.ucdavis.edu/publications/print_pub?pub_id=919
And there's even stuff that predates that.

Virtual shadow mapping will definitely become a very popular technique in the future. I wonder how they are handling the fine grained culling (or are they just brute force rendering it, because the scene is so simple).
Not convinced that it's going to be the slam dunk that everyone seems to think that it is. The problem with RMSM and similar software implementations of tiling/sparseness was never really the sampling part, it's the render part. TR doesn't really do anything to help that at all. You end up with largely intractable questions about where to put resolution since any receiver can require an arbitrary amount of it. Thus you either ignore projective aliasing and end up not much better than cascades (if at all) or start trying to play some sort of complex optimization game that almost always ends up with unstable shadows.

That's fundamentally why SDSM ended up being as simple as it was. It isn't because we weren't well aware of the alternatives and playing with them concurrently, but rather the optimization/rendering/"culling" portion of tiled shadow maps is fundamentally a problem. This was a major focus of the Neoptica startup (later bought by Intel - where Aaron, author of the RMSM paper and others came from) and they were able to implement it fairly successfully in the early PS3 days. Unfortunately they ran into these pretty significant issues, which we tried to hint at with the whole discussion of the tradeoff between guaranteed quality and guaranteed performance/budget in the SDSM paper.

I'm not completely poo-pooing the idea - it works in some cases. Just don't get the impression that the hardware really solves any of the fundamental problems that affect the software approach. You basically have to ignore projective aliasing, at which point you end up with quality that is similar to SDSM, iso-performance.
 
Pure ASM might become practical (before UAV it wasn't really even doable). No optimization needed to pick local shadow map resolution, the quadtree is just an index structure to find the covered samples with their exact coordinates to be used for intersection testing. Only performance is unstable, not the shadows :)
 
Pure ASM might become practical (before UAV it wasn't really even doable). No optimization needed to pick local shadow map resolution, the quadtree is just an index structure to find the covered samples with their exact coordinates to be used for intersection testing. Only performance is unstable, not the shadows :)
Right, but ASM has some crippling problems in that you can't actual determine required resolution by image processing on some frequency of sampling of the shadow image. This is discussed a bit in the RMSM paper, but it's serious enough to make it borderline unusable in practice.

And ASM still has the rendering problem. As soon as you break up your space into discontinuous tiles you're going to run into that issue with any short of a quad-tree rasterizer or ray tracer. Of course you can try and mitigate it with caching (i.e. only update some of the pages per frame or something, same with RMSM) but that only goes so far. With shadows as soon as you give up dynamic updates for any reasonable fraction of the shadow map, using shadow maps in general starts to be questionable.
 
Last edited by a moderator:
Sorry, I meant irregular shadow mapping (and yes, you will need a software rasterizer, not a raytracer through).
 
Ah yes, that makes more sense.

That said, irregular shadow maps were part of the research scope during the SDSM work as well. Ultimately the problem with them was unpredictable/unbounded performance constraints and no efficient filtering method. They sort of end up being "a faster single-ray/stencil shadow", but that's usually not good enough these days.

Anyways, fun times ahead I'm sure, just a little bit more skeptical that even hardware-implemented RMSM or similar is a silver bullet.
 
https://developer.nvidia.com/opengl-driver

These new ARB extensions are provided:

For GeForce 6xx and above capable hardware:

ARB_bindless_texture
ARB_seamless_cubemap_per_texture

For OpenGL 4 capable hardware:

ARB_compute_variable_group_size
ARB_indirect_parameters
ARB_shader_draw_parameters
ARB_shader_group_vote
ARB_sparse_texture
So apparently all nVidia DX11/OpenGL 4.x GPUs support sparse textures after all? That's interesting considering earlier discussion in this thread seemed to suggest that DX11.2 Tiled Resources Tier 1 might require bindless textures hence Keplar support but no Fermi support, while Tier 2 requires sparse textures hence why it seems to require GCN. Yet it seems that both Fermi and Keplar support sparse textures in OpenGL which isn't reflected in DirectX. I suppose the OpenGL and DirectX approaches are different.
 
It gets complex. Among other things, ARB_spare_texture doesn't require support for sparse allocation for compressed textures, which is one of the big things in AMD's original implementation. The ARB notes that "in all likelihood, implementations will support some or all compressed texture formats" which means compressed textures aren't precluded, but you can't count on them being there. Given the vast use of compressed textures, that may limit the adoption of this iteration of spare textures.
 
I think the main difference here, is sparse textures can be supported by API ( or software if you like )..
Most likely, Nvidia could support it on DX11.x by API, the same way they support some other DX11.1 features level. The result is maybe just a question of performance and not really a problem of compatibility.

Sparse texture, is somewhat the PRT from AMD ( Partial Resident Texture ).. But AMD got it on hardware support, when Nvidia dont. If it is a requirement for next DX11.2.. its likely that Nvidia will include it on hardware on next gpu's, the way it is handled by OpenGL is just here ( maybe on limited possibilites ) for keep compatibility with older gpu.

@Ryan, as you are here: excellent article, thanks for have cover the Khronos annoncement: looking further to other Siggraph cover.
 
Last edited by a moderator:
Lux.png

Bit.png

Twice as fast as gk107 on same driver revision!? Better opencl drivers or why does it suck so much less?
 
It just wasn't so apparent when titan launched, like http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3
But yeah, I see that luxmark (that conveniently didn't work on titan when it launched) benefits a lot http://ht4u.net/reviews/2013/55_directx11_grafikkarten_im_test/index26.php (sala scene, the above is the simple luxball scene, not sure what anandtech is using: http://images.anandtech.com/graphs/graph6994/55179.png ) - i guess mostly from the extra registers.
 
gk110 (and I guess gk208 as well) can do 64 shifts per clock instead of 32 (per smx) which surely should help _a lot_ with bitmining. IIRC it also has a rotate instruction now. Obviously it's not enough to catch GCN there (I think GCN can do full-rate shifts, that is 64 per CU), but at least it looks less appalling now. Not sure what the story is with luxmark.
It actually looks like a decent improvement, dropping half the tmus/rops hardly seems to make a difference (though I'm still not really convinced that it has half the tmus, some sites as well as gpu-z still claiming this isn't the case, none of the reviews I've seen included texture fillrate results???)
 
Last edited by a moderator:
I see only in Bioshock 720P a real advantage, otherwise, they are purely equal, but it should have more games throw at and maybe a bit of AA ( 2x-4x ) for get an idea. ( this card come with only a 64bit bus )

What is interessant is the TDP there, thats really good for 49W.

Yeah I mostly meant the perf/Watt and perf/area lead it seems to have.
 
Yeah I mostly meant the perf/Watt and perf/area lead it seems to have.
I don't really see an advantage in perf/area neither. While 7730 indeed seem to use Cape Verde for reasons I don't understand, the specs are exactly the same as Mars, which isn't any larger (plus at that clock it is sort of underclocked).
Power consumption seems rather nice indeed, though I wonder how it would compare to Mars. As that isn't available however (as desktop card) gk208 gt640 indeed seems like a new perf/w champion in that performance (price) bracket (well unless you're looking at compute performance of course).
 
I don't really see an advantage in perf/area neither. While 7730 indeed seem to use Cape Verde for reasons I don't understand, the specs are exactly the same as Mars, which isn't any larger (plus at that clock it is sort of underclocked).
Power consumption seems rather nice indeed, though I wonder how it would compare to Mars. As that isn't available however (as desktop card) gk208 gt640 indeed seems like a new perf/w champion in that performance (price) bracket (well unless you're looking at compute performance of course).

Hainan is probably the closest competitor at this point.
 
Back
Top