NVIDIA GF100 & Friends speculation

Tessellation units themselves, but what about a real world scenario?

Pure tessellation is useless, what makes it powerfull are domain and hull shaders, which are not tied to setup rate. On top of that, there are all the other shaders which work with all the data thrown at them by the tessellation stage, so the higher the tessellation factor, the higher the pressure on the ALUs.

I really think something went wrong here.

I'm talking about a real game situation. I've been working with tessellation since the 9700, very familiar with its limitations.

Also if you look at Crysis and a few other games, you will see the same thing, these games tend to get triangle setup limited, even without tessellation in many instances. Now figure that tesselation will increase that magnitude of set up rate of x4 or 5 the original polygons, well that's where the hurt factor is.
 
Tessellation units themselves, but what about a real world scenario?

Pure tessellation is useless, what makes it powerfull are domain and hull shaders, which are not tied to setup rate. On top of that, there are all the other shaders which work with all the data thrown at them by the tessellation stage, so the higher the tessellation factor, the higher the pressure on the ALUs.

I really think something went wrong here.

Higher the tessellation factor, more the load on setup units. When you look at the fact that setup has been stuck at 1tri/clk for > a decade, and the way ALUs have taken off, ALU's begin to look pretty much free considering that games will soon be seeing a OoM-step-function-jump in the amount of geometry rendered.

This is an important breakthrough, one whose time has come. However, even making it 4 tri/clk is only a transition phase. I am expecting similar jumps in setup and tessellation in future gpu's, unlike the last decade, where they have been pretty much stuck.
 
You'd think they would have demo'd crysis then.


only in certain scenes, it might be a specific viewing angle as well, doesn't happen all the time in Crysis, there are areas that are more CPU bottlenecked, some areas that are pixel shader bottlenecked, etc, etc, the game pushes the computer in different ways depending on what is being rendered.
 
It kinda remind me of Conroe when when people were fed with info on its horsepower (in terms of GFLOPS) and how it stacked up against NetBurst , some benchmarks saying how wonderfully it did against NetBurst and then AMD had a flop (performance wise)...
You're trying too hard, really.

With a slight difference that benchmarks were all over the place back then.

I'm not trying anything, really. Especially that I know that there's nothing one could try to against your unshakeable faith ;). If Fermi is indeed what it's being pictured to be, then great, I might buy one! I'm just voicing my doubts, that's all.
 
So according to the hardwarecanucks benchmarks this gf100 is 24%-28% faster than an HD 5870 in far cry 2?

If we take that as best case scenario it doesnt look all that great, actually it's last year all over again. Is there any reason to assume this isn't best case scenario, ie are nvidia known for demonstrating underpowered parts and choosing unflattering benchmarks?



Huh? I get 31% and 39%.
 
I'm talking about a real game situation. I've been working with tessellation since the 9700, very familiar with its limitations.

If you are talking Radeon 9700 then that had no hardware tessellation. R200 had N-Patch support, then there was nothing in hardware for R300, R400 and R500. Tessellation reappeared in Xenos which translated through the R600 and R700 and Evergreen has had considerable changes for DX11 support.

You'd think they would have demo'd crysis then.

Our modelling suggests that, at best increasing the primitive rate buys about 2% on current apps. We spent the area elsewhere.
 
I think anand misread that. It specifically says a texture unit can do 1 texture address and fetch 4 texture samples. But you need 4 texture samples for one bilinear filtered texture fetch, hence that's really the same rate as they always had. The difference though is now that they can take 4 individual samples for gather4 and return that, something older chips (from nvidia - amd could do that for a long time already) couldn't do. It is also possible efficiency was boosted otherwise, IIRC all (g80 and up) nvidia quite failed to reach their peak texture rate. Still, 64 units doesn't look like a lot - if you put that in terms of alu:tex ratio it is quite comparable to what AMD has, however.


Wasn´t G80 16/64?
 
If you are talking Radeon 9700 then that had no hardware tessellation. R200 had N-Patch support, then there was nothing in hardware for R300, R400 and R500. Tessellation reappeared in Xenos which translated through the R600 and R700 and Evergreen has had considerable changes for DX11 support.


Yes I am aware the r300 has a software pipeline for tessellation and not a hardware one, but the performance penalties for tesselation hasn't changed much over the generations and specifically is consistent with triangle setup rates depending on the generation of hardware. The Unigine Dx11 demo gives a higher performance drop actually the what I have stated.
 
Huh? I get 31% and 39%.

And you are correct, I have no idea how I came by my numbers.

So its a bit more impressive yes, but still looking at best case? I can't see a refresh getting close to it, but this particular card isnt going to beat the 5970 either.
 
Certainly not including (tesselated) Heaven or Grid2 here I guess?
Heaven shows too much of a drop with tessellation active, I think something is broken in the process and that's why NV claims a ~500% advantage on the "Heaven draw call" (which, in fact, could mean just about anything).

After all, Heaven is v1.0 and tessellation is almost a primer in it, aside from the fact there are way too much triangles used for very basic objects like plane and rectangular doors.
 
Heaven shows too much of a drop with tessellation active, I think something is broken in the process and that's why NV claims a ~500% advantage on the "Heaven draw call" (which, in fact, could mean just about anything).

After all, Heaven is v1.0 and tessellation is almost a primer in it, aside from the fact there are way too much triangles used for very basic objects like plane and rectangular doors.


Heaven is not broken you will see different amount of drops in a game situation because the bottleneck shifts depending on the situation. Draw call is actually the drawing of the polygons.
 
Our modelling suggests that, at best increasing the primitive rate buys about 2% on current apps. We spent the area elsewhere.

On what kind of data were your models based? I mean, it surely makes a difference if your basis is the raw performance of a Redwood instead of a Cypress (or even Hemlock).
 
Heaven shows too much of a drop with tessellation active, I think something is broken in the process and that's why NV claims a ~500% advantage on the "Heaven draw call" (which, in fact, could mean just about anything).

After all, Heaven is v1.0 and tessellation is almost a primer in it, aside from the fact there are way too much triangles used for very basic objects like plane and rectangular doors.

Funny, how fast the momentum of tessellation is sinking.
 
Certainly not including (tesselated) Heaven or Grid2 here I guess?
According to nVidia slides, GF100 is 59% faster than HD5870 in the tested scene.

According to several reviews, HD5970 is 53-61% faster than HD5870 in Heaven benchmark. Despite the test and scene were chosen by nVidia, Fermi doesn't perform faster than HD5970. Fermi is presented as a tessellation monster, but in most intensive tessellation benchmark it doesn't perform better than HD5970.

I think this pretty corresponds to Dave's words - nVidia spended a lot of research and silicon for a feature, which has quite low impact on real-world performance... Reminds me R5xx and its dynamic branching...
 
And you are correct, I have no idea how I came by my numbers.
So its a bit more impressive yes, but still looking at best case? I can't see a refresh getting close to it, but this particular card isnt going to beat the 5970 either.
It could be because they are still optimizing drivers , and Far Cry 2 gave them early optimum scalability .
 
According to nVidia slides, GF100 is 59% faster than HD5870 in the tested scene.

According to several reviews, HD5970 is 53-61% faster than HD5870 in Heaven benchmark. Despite the test and scene were chosen by nVidia, Fermi doesn't perform faster than HD5970. Fermi is presented as a tessellation monster, but in most intensive tessellation benchmark it doesn't perform better than HD5970.

I think this pretty corresponds to Dave's words - nVidia spended a lot of research and silicon for a feature, which has quite low impact on real-world performance... Reminds me R5xx and its dynamic branching...


Of course the 5970 does well, it is now doing 2 traingle set ups per clock.
 
Heaven is not broken you will see different amount of drops in a game situation because the bottleneck shifts depending on the situation.
Sorry, but there has to be something.

It's not ALU bound, not texturing bound, and there are not enough triangles to explain such a low framerate in heavily tessellated scenes (which are even heavier than the one selected by NV).

NV didn't show their settings either iirc, and AA shows a big loss too on this and other scenes.

Edit : by the way, after trying some combinations with Juniper and Cypress, I can say it's clearly not 100% setup bound as an half clock Cypress is worse than Juniper but not 50% slower, and it has nothing to do with tessellation as that's visible even on non-tessellated scenes. It's "frequency bound", but I can't figure where... my assumption is it is a shared memory crossbar latency/throughput issue.
 
Last edited by a moderator:
Back
Top