NVIDIA GF100 & Friends speculation

Yes, actually (all in DX11-Mode, on my C2D E8500@3,8 GHz, Win7 x64 and HD 5870 with Cat 10.3 preview drivers)
Nice work, though shouldn't you have generated some ad income with a nice article to wrap this data?

Are they meant to be 2560x1600 results, not 1500? I'm assuming they are, though it doesn't really make much difference. Also, with AF off? Though I guess that won't make any real difference either.

Tessellation off, Shaders: High:
640x400 149,7
1280x800 104,9
2560x1500 40,4

[...]

Now, we switch shadows to low and turn off the other options in Heaven (Refraction, Reflektion and Volumetric)
Tessellation off, Shaders/Shadows: Low, Refr./Refl./Vol.: Off
640x400 150,7 (huge gains to be had here ;))
1280x800 127,3
2560x1500 59,8 (well, we're touching the magical 60-fps-barrier)
Woah, the engine is suffering from a severe pre-pixel bottleneck. 2560x1600 has 4x the pixels of 1280x800, but 1280x800 is not 4x faster :oops: DX9-style draw call overheads?

With such poor scaling it makes most of the rest of an analysis of bottlenecks pointless. The only get-out clause could be that this terrible scaling is the ATI driver's fault, or the GPU architecture has got a monster bottleneck.

Scaling with GF100's going to be interesting, lots of nice bits and pieces that should reduce architectural bottlenecks...

Jawed
 
Nice work, though shouldn't you have generated some ad income with a nice article to wrap this data?

Are they meant to be 2560x1600 results, not 1500? I'm assuming they are, though it doesn't really make much difference. Also, with AF off? Though I guess that won't make any real difference either.
Thanks, but what I do in my free time is not necessarily going to be commercialized :) And yes, of course they should be in 2560x1600, AF was off, respectively set to 1.

Woah, the engine is suffering from a severe pre-pixel bottleneck. 2560x1600 has 4x the pixels of 1280x800, but 1280x800 is not 4x faster :oops: DX9-style draw call overheads?
It certainly doesn't look like a very pixel centric approach. If i remember right, Heaven did not stress more than a single core much either, so it'd be pretty pointless to run it at the office with a 4 GHz Lynnfield - but i'll try anyway.

With such poor scaling it makes most of the rest of an analysis of bottlenecks pointless. The only get-out clause could be that this terrible scaling is the ATI driver's fault, or the GPU architecture has got a monster bottleneck.
Yeah, that's why those number didn't leave my little red book until now. I was thinking I could see some interesting results, doing some scaling benchmarks and stuff, but now you see what this "afternoon in heaven" really resulted in. :)

Nevertheless, Ati did not have any hard reason yet to do optimizations for heaven – basically any number they could come up with would be against a non-existent competition and thus look good by itself. Maybe the release Cat 10.3 or .4 has some nice little suprises for us.

Scaling with GF100's going to be interesting, lots of nice bits and pieces that should reduce architectural bottlenecks...
Jawed
I'm looking forward to see it in action (the cooler sure looks nice) - no matter whether i might laugh, smile or just cry. ;)
 
Last edited by a moderator:
It certainly doesn't look like a very pixel centric approach. If i remember right, Heaven did not stress more than a single core much either, so it'd be pretty pointless to run it at the office with a 4 GHz Lynnfield - but i'll try anyway.

I reached 100% on core one and 60% on the second one at around 140fps in 640*480 windowed after 22:00 hour (no shadows,its night and just street lamps). The same scene at 12:00 hour (strong sun,shadows) is 70% core one and 30% second core at 70fps.

The strangest thing is when i look to the plain sky (just sky ,no objects) at night i get around 280 fps but when i change to day i get around 140fps at the same plain sky :!: (the same 640*480 max details and no AA AF, DX10,4850)
 
I think I got something like 60-75% scaling going from a E8500 to an i7 860 on the highs, but low fps was more like 40-50% during high tessellation scenes.
 
That seemingly depends on whether you use tessellation or not.

Here we go:
Green is with tessellation enabled, blue without additional geometry. Both graphs are from my E8500, shown as one CPU, so 50% basically equals one core maxed out.

heavencpu.png


Note that I've edited the graphs ever so slightly to overlay correctly at the beginning and at the end. There seemed to be a different runtime of about 1-2 seconds. Both graphs were taken with 640x480, no AA/AF, max details.

Edit:
Re-run on a Core i7-860. With low graphics load, 640x480, no tessellation, CPU utilization never went up above 25% - that equals two threads.
 
Last edited by a moderator:
That seemingly depends on whether you use tessellation or not.
I think you're just seeing the effect of tessellation increasing the GPU bottleneck and thus driving up frame times (and relatively driving down the amount of work the CPU is doing to generate frames for the GPU to process). If you modified the resolutions or similar so that the two runs were running at similar frame rates I imagine you would see the lines close together regardless of the tessellation setting.
 
If you modified the resolutions or similar so that the two runs were running at similar frame rates I imagine you would see the lines close together regardless of the tessellation setting.

If they ran at similar frame rates then you can conclude performance is similar too.

Not sure what you mean? Why would tesselation depend on resolution?
 
Tessellation off, Shaders: High:
640x400 149,7
1280x800 104,9
2560x1600 40,4

Tessellation on, Shaders: High:
640x400 68,2
1280x800 52,8
2560x1600 27,2 (about two thirds the perf w/o tessellation)

640x400 -> 2.19/0.45 :oops:
1280x800 -> 1.98/0.50 :???:
2560x1600 40,4 -> 1.48/0.67

I think it's a serious subpixel polygon size issue caused by tesselation. I don't know which hardware units correspond to that though.
Also MSAA should become more and more equivalent to SSAA in case polygon-size reaches 1 pixel. Thus I suggest to compare the behaviour of MSAA performance vs. SSAA performance from high to low resolutions.
I wonder if nVidia makes the trick to not AA the polygon-edges produced by tesselation. If one assumes triangles go near 1 pixel AA is not visible anymore, one may implement a selective AA threshold.
 
Will the engine/hardware actually tesselate surfaces to the same degree at 640 rez as at 2600 rez? :oops:

That would be kinda stupid, IMHO...
 
The fact that we havent seen a single 480 doesnt instill much confidence in me. The GF100 launch smells too much like the R600, the only difference being that Nvidia might have a competitive top dog but on paper. Again, my opinion purely on what has been leaked so far. :p
 
Also, what is it with all these leaks so far? Is AMD going to spoil fermi's launch by announcing a rebranded driver (catalyst->catalytic) ? :rolleyes:
 
Last edited by a moderator:
Will the engine/hardware actually tesselate surfaces to the same degree at 640 rez as at 2600 rez? :oops:

That would be kinda stupid, IMHO...

http://msdn.microsoft.com/en-us/library/ee417841(VS.85).aspx#Tessellator_Stage

Direct3D specifies the maximum tessellation factor to be 64. When that limit is hit, a fixed surface will not be tessellated further. So if something is fully tessellated, increasing resolution will not change the degree of tessellation.

Also, the programmer is free to specify these factors, and may himself limit them for efficiency reasons.
 
Not sure what you mean? Why would tesselation depend on resolution?
No, the point is that tessellation on/off should not affect the amount of CPU work done. The graph on the previous page was misinterpreted based on just shifting the bottleneck further to the GPU.
 
Only 4 partners will launch Fermi (this might be for Europe only) and XFX won't be one.
Cards won't ship until 1 or 2 weeks after launch. though partner availability in May will be quite bigger.

NV Hopes to get the speeds from the B1 chip that everyone is expecting from the launch chips. So we're a couple of months away from seeing the true GF100 performance.
 
Only 4 partners will launch Fermi (this might be for Europe only) and XFX won't be one.
Cards won't ship until 1 or 2 weeks after launch. though partner availability in May will be quite bigger.

NV Hopes to get the speeds from the B1 chip that everyone is expecting from the launch chips. So we're a couple of months away from seeing the true GF100 performance.

you think nvidia will release the gtx 480 and then a few months later introduce an even faster card ?
 
Only 4 partners will launch Fermi (this might be for Europe only) and XFX won't be one.
Cards won't ship until 1 or 2 weeks after launch. though partner availability in May will be quite bigger.

NV Hopes to get the speeds from the B1 chip that everyone is expecting from the launch chips. So we're a couple of months away from seeing the true GF100 performance.

If B1 is even production silicon. While it's not unlikely cases where NV went into production with A1/B1 silicon should be extremely rare.

you think nvidia will release the gtx 480 and then a few months later introduce an even faster card ?

Albeit (not necessarily relevant) just an example before they released the G80 they claimed that up to 1.5GHz hot clock is possible. The A2 8800GTX arrived with 1.35GHz and later on under the A3 metal spin we saw the ultra-rare 8800Ultra with 1.5GHz.
 
Back
Top