NVIDIA Kepler speculation thread

http://forum.beyond3d.com/showthread.php?p=1661603

Tom's has the card slightly slower than even 7870, TPU has it slightly slower than 7950 with old BIOS at 1920xXXXX both, reference clocks.

Toms and HT4U both, though HT4U was over 17 games and the Ti still couldn't win.

I think TPU might have a couple of outliers, ie WoW is so massively better on Nvidia that it is an auto 2% swing to them in the end performance totals - even though all cards are well over 100 fps in that game so it doesn't matter.
 
About the Showdown thingie, Damage says the following in the comments section at TechReport:
I omitted the Showdown results from our overall index for several related reasons. First, because AMD told us themselves that they worked directly with CodeMasters to implement a new lighting path in this game engine. That lighting path happens to work very poorly on GPUs produced by AMD's competitor--so poorly, in fact, that the GeForce results for that game are *half* the speed of the Radeons in the 99th percentile frame times. That's true despite the fact that these Radeons and GeForces perform comparably in every other scenario tested. Also, the size of the performance gap in Showdown skews the overall results sufficiently that it offers a very different picture than we see in the other five games.

Thus, baking in the Showdown results to our overall index didn't seem fair.

There is precedent here, too. We omitted HAWX 2 results from the overall index when it skewed strongly toward Nvidia.

We did explain the logic behind our decision, and we published the Showdown data for the world to see. If you prefer a different solution, feel free to create your own index that includes that data.

You are free to disagree with us.
I expect that those who agreed the most with the decision to exclude Hawx2 will also agree with this one.;)

That said: good job by AMD to be able to convince a game maker to do this. It doesn't happen a lot.
 
About the Showdown thingie, Damage says the following in the comments section at TechReport:

I expect that those who agreed the most with the decision to exclude Hawx2 will also agree with this one.;)

That said: good job by AMD to be able to convince a game maker to do this. It doesn't happen a lot.

Scott probably made the right decision, but there's one small difference: Hawx 2 ran slowly on Radeons because it used silly amounts of tessellation whose purpose was apparently to improve NVIDIA's relative standing more than image quality, at the expense of some frames per second even on NVIDIA GPUs.

I don't know what Showdown looks like and how good its quality-to-performance trade-off is, but if it's good, then it's different from Hawx.
 
Honestly on techpowerup you have one ancient game - World of Warcraft - contributing about 2% or so in favour of Nvidia in every single card benchmark. That's over 17 games as well, and most of the cards being benchmarked are so far over 100fps it should be utterly irrelevant. Why doesn't TPU remove that from their overall totals?

I disagree with Scott here. In BF3, the 670 AMP is 18% faster. In Dirt Showdown the MSI 7950 OC is 22% faster.

Overall the 670 AMP is 5% faster. Without Dirt Showdown the 670 AMP is 9% faster. Without BF3 the 670 AMP is 1% faster.

Is it really fair to single out Dirt Showdown as an outlier when BF3 is also giving so much to the 670's overall performance? Where exactly do you draw the line?
 
Scott probably made the right decision, but there's one small difference: Hawx 2 ran slowly on Radeons because it used silly amounts of tessellation whose purpose was apparently to improve NVIDIA's relative standing more than image quality, at the expense of some frames per second even on NVIDIA GPUs.

I don't know what Showdown looks like and how good its quality-to-performance trade-off is, but if it's good, then it's different from Hawx.

I quote the AMD blog
Advanced Lighting
DiRT Showdown™ implements an all new rendering system, similar to that demonstrated in AMD’s “Leo” demo (also known as Forward+). In short this allows all lights in the scene to be truly dynamic lights, rather than just the age old hack of rendering 2D glows. This is achieved by building global lists of all lights in the scene, and then using DirectCompute to produce a culled light list for tiled regions of the screen. During the actual Pixel Shader lighting phase, only the culled light list for a given pixel is processed. This makes it possible to have thousands of dynamic lights in a scene and still achieve playable frame rates.


Global Illumination

In addition to the “Advanced Lighting” render path implemented in this title, Codemasters has gone one step further, and added support for Global Illumination. In essence this is an extension of what is made possible by the new way lights are handled by the engine. To achieve this effect the engine renders a Reflective Shadow Map (RSM) of the scene, taken from the point of view of the Sun. The RSM contains Position, Normal and Diffuse Color. The engine then uses DirectCompute to spawn a list of Virtual Point Lights (VPLs), for all of the positions contained in the RSM, and uses the Normal and Diffuse Color, to simulate light being reflected from all surfaces in the scene. This list of VPLs is then passed through the engine along with all the other dynamic lights. The result is a stunning improvement in visual quality.

Shader Model 5.0 Contact Hardening Shadows
This is a very high quality shadow filtering technique that computes the average distance of a shadow pixel from its casting blocker. Using this distance the effect accordingly decides how hard or soft the shadow pixel should be. The nearer a pixel is to its blocker, the harder the shadow. This technique much more closely mimics shadows from the real world. In DiRT Showdown™ this technique has been significantly improved in both performance and quality.

DirectCompute Accelerated High Definition Ambient Occlusion
DiRT Showdown™ implements a new and improved version of HDAO that uses full 3D camera space position data to detect valleys in the scene that should be shaded darker, and attenuates the lighting based on valley angle. Since this effect is expensive, it is computed at half screen resolution. In order for the half resolution HDAO buffer to be re-matched with the main color scene, a DirectCompute accelerated bilateral dilate and blur is performed to ensure that AO properly meets objects from the full screen resolution scene. In DiRT Showdown™ this technique has been significantly improved in both performance and quality.
For screen and image quality.
http://blogs.amd.com/play/2012/07/03/dirt-showdown-amd-benchmark-guide/

I dont know exactly the "special" method who bring the problem on Nvidia hardware, could be just they need work on it. In reality looking at the difference of fps, i have no problem with reviewers decision. Well they could simply disable one of thoses settings on Nvidia card, and just specify it.

Anyway searching for this i was looking at the gamescon AMD session, im impressed by the list of future games who support directly in their setting the HD3D and or Eyefinity, look like AMD team have start work a bit more closer with developpers of what we was used on the past.
 
Last edited by a moderator:
That said: good job by AMD to be able to convince a game maker to do this. It doesn't happen a lot.
Using DirectCompute for rendering is not exactly a stretch, DC has been implemented in many titles for some reason or another and Forward+ rendering mechanism hasn't been without interest from many quarters. Global Illumination is also something that has long since been looked forward to for realtime rendering, yet here's a great looking (and great fun) game that's already delivering it in a playable manner. I'm not sure that there needed to be that much convincing.

Now, when the work started who would have known that Compute performance wasn't the top priority of Keplar?
 
Using DirectCompute for rendering is not exactly a stretch, DC has been implemented in many titles for some reason or another and Forward+ rendering mechanism hasn't been without interest from many quarters. Global Illumination is also something that has long since been looked forward to for realtime rendering, yet here's a great looking (and great fun) game that's already delivering it in a playable manner. I'm not sure that there needed to be that much convincing.

Now, when the work started who would have known that Compute performance wasn't the top priority of Keplar?

Its funny when i read the nvidia plot about Samaritan, it dont look it is global illumination on showdown who should give them problem.. So stay the HDAO compute or the forward render on advanced lighting
 
The difference isn't all that big at toms (not really an outlier there), and they mention some problems with an earlier version of the game: http://www.tomshardware.com/reviews/geforce-gtx-660-ti-benchmark-review,3279-6.html

Its cause they dont put the setting in ultra due to low fps . so they use "high" setting. who dont enable the setting for HDAO, Hardened shadow and Global illumination, advanced lighting etc ( anyway not at their highest level ) . Basically, without thoses setting enabled you are close to Dirt3 graphism. So we are back to the start point, when thoses taxing performance are enabled, the nividia's performance drop more.
Sadly i dont have the game so i can check memory usage etc for see if it have an impact too.
 
Last edited by a moderator:
Dave Baumann said:
Using DirectCompute for rendering is not exactly a stretch, DC has been implemented in many titles for some reason or another and Forward+ rendering mechanism hasn't been without interest from many quarters. Global Illumination is also something that has long since been looked forward to for realtime rendering, yet here's a great looking (and great fun) game that's already delivering it in a playable manner. I'm not sure that there needed to be that much convincing.

Now, when the work started who would have known that Compute performance wasn't the top priority of Keplar?
You're right that GI has been pushed by Nvidia, so you'd think they're not totally bad at it. Is UE4 equally lacking on a Kepler GPU compared to AMD? A 40+% difference is not the kind of difference we're used to see between similar cards and you'd think that even with GI enabled there's still a lot of non-DC stuff going on.
 
It's probably a combination of an architectural weakness and bad drivers.

Obviously with card releases and other stuff to do, some things go undone. Even with superior driver team resources, they can't do it all and my guess is that fixing their Dirt Showdown woes are very far down their to-do list. If you pay enough attention, you'd probably conclude that AMD's driver team has similar choices to make and often appear to be even more stretched.

If anyone has been following Guild Wars 2, they'd know how badly Nvidia has been screwing up the drivers for that game recently as well btw. They literally came out and admitted they had no good driver (then blamed arenanet).

GW2ErrorMessage.png


http://www.guildwars2guru.com/topic/48375-best-nvidia-geforce-driver-for-gw2/

2nd post is from an Nvidia employee (game launches next Saturday).
 
You're more relying on ANET to deliver well optimized and threaded graphics engine code than you are nVidia to deliver a driver tailored for GW2.

-----------
My issue with my GTX280 is that even on max graphics I'm only running at 50% gpu usage with 15-25fps. Will this driver help?
-----------

Generally speaking, that side of the issue is more bound to ANET and not to nVidia.




That's blaming arenanet?
 
About the memory bus in 660Ti - is the interpretation I read somewhere correct, where 660Ti actually has full bandwidth only to 1.5GB of memory, but if it has to use the last 512MB too, there's only 1/3rd bandwidth available for it?
 
Computerbase tests "reference clocked" cards with both 2 and 3 gb (I suppose the 3 gb version will be regular 192 bits all over), without any perceived difference: http://www.computerbase.de/artikel/grafikkarten/2012/test-nvidia-geforce-gtx-660-ti/7/

But of course we can't really know if the 2 cards used are boosting the same etc. And for such test we should also make sure that it's not using more than 2gb, but generally that's not the case even in 2560. (duno if we need to use more than 1.5gb to start using the last 512mb).
 
Computerbase tests "reference clocked" cards with both 2 and 3 gb (I suppose the 3 gb version will be regular 192 bits all over), without any perceived difference: http://www.computerbase.de/artikel/grafikkarten/2012/test-nvidia-geforce-gtx-660-ti/7/

But of course we can't really know if the 2 cards used are boosting the same etc. And for such test we should also make sure that it's not using more than 2gb, but generally that's not the case even in 2560. (duno if we need to use more than 1.5gb to start using the last 512mb).

from a review I've just read (something on HardOCP) it's 1GB + the last 1GB. they say it has 1024MB on 128bit and 1024MB on 64bit.
 
That's most probably not true. It all comes down how the memory space is interleaved between the three 64 bit memory controllers. It shouldn't be a problem to set it up in a way, that the first 1.5GB are accessed in 64byte chunks (or whatever the cache line size is) interleaved between the three memory controllers (the usable bandwidth equals a 192bit connection). Only for the the last 512MB there would be no interleaving (i.e. that would equal the bandwidth of a 64bit memory controller). Alternatively, the interleaving scheme could also be set up in a way, that the usable bandwidth is constant over the full 2 GB (1,3,2,3,1,3,2,3 ...), but this would reduce the bandwidth to the equivalent of a 128bit memory controller, so I would doubt it.
Most probably it is the first version, 1.5GB can be accessed full speed and the last 512MB are slower (but still a lot faster than PCI-Express). The driver probably tries to allocate as much as possible in the first 1.5GB and uses the last 512MB only to avoid swapping something over PCI-Express. One could check this by comparing the 2GB and the 3GB versions in some tests with a very high memory utilization (crossing the 1.5GB limit).
 
Back
Top