Why Does ATI Get Beat Down on OGL

zeckensack said:
And lastly, some people coming off the NV_vertex_array_range path just don't get it. NVIDIA supports some rather peculiar usage models of VBO (allocate buffer object, lock it, fill it, render it once and throw it away again; might as well use immediate mode instead), probably because it is their VAR legacy model. ATI drivers don't support such stuff all that well and rather go for a more pure VBO model (fat storage objects are for reuse).
This is partly right. There was a significant difference in behaviour between the ATI driver and other vendor's cards that did have an impact if the game made use of ARB_vbo for dynamic objects.

In the initial implementation it was predicted that creating and deleting vbo's was likely to be a 'resource create time' process and in order to support dynamic vertex buffers efficiently a fair bit of effort went into making updates (that don't reallocate) fast and synchronous. In contrast, other vendors were preaching that synchronised updates were expensive, and it was better to recreate the buffer instead.

That 'issue' was largely resolved over a year ago - although I'd still say that updates are better than incurring memory management overhead and the inevitable fragmentation that accompanies it.

It's not significantly less efficient to use a vbo to do dynamic data as it is to use a standard vertex array. The major difference is that the application has to manage memory instead of the driver doing it.
 
Chalnoth said:
I don't think that's the case. Take Doom 3 as an example: the "default" renderer uses no NV extensions, and in fact uses the ARB_fragment_program extension, which was originally written by ATI.

I don't think ATI wrote the standard all by themselves. Although I think you are correct about the NV extensions.
 
rwolf said:
I don't think ATI wrote the standard all by themselves. Although I think you are correct about the NV extensions.
The original writeup of ARB_fragment_program was written by ATI, but it was certainly modified by the ARB before being accepted.
 
Chalnoth said:
All of these games are quite old, and thus much less interesting.

Have you even been reading this thread?

And they are losing out everywhere in benchmarks that were done, oh, this year with reasonably recent games. Every OpenGL benchmark that people have thrown at the X1x00 cards has put these cards 20%-40% behind where one would expect them based on their Direct3D performance.
Yes I have read this thread. So you're maintaining that NVIDIA beats ATI everywhere in OpenGL, and there are more examples for this than just Doom 3. There are the Bioware games, there are some flight sims and there's Riddick.

When I name other games that don't fit your argument you dismiss them as being "quite old" and narrow down the claim to "reasonably recent games".

Couldn't we just have a look at what features the slow games use, what features the fast games use and perhaps come to a meaningful conclusion? Like "The GeforceFX family sucks at pixel shading" instead of "... sucks at Direct3D"?

Why, what features are so slow that they explain these results? It's not ARB_fragment_program. It's not texture memory management. It's not plain old glDrawRangeElements from app memory, which is still a great way to submit dynamic geometry.

So what do you think? VBOs, render-to-texture, multi-context support ...?

Or do all "reasonably recent games" use the more advanced asynchronous features of NVIDIA drivers (NV_fence, PBOs for data uploads) and all the advantages come from here?

After posing all these questions I'd imagine you really don't care. That's fine. But there are some people who are not satisfied with the summary headline and rather like to know what's going on, not least so we can bother ATI specifically. It's much more effective to yell "fix this feature" than it is to yell "fix everything".
Chalnoth said:
Since Riddick doesn't use shadow buffers, it doesn't use PCF. And besides, nobody benchmarks with that rendering mode enabled anyway.
I only have the demo and when I played it it used stencil shadows, and in this case it's instant victory for NVIDIA. I know there's more advanced rendering modes, and I took "soft shadows" as some application of depth textures.
Okay, sorry if that was false. Anyway, NVIDIA hardware has helpful extra transistors for both flavors of shadow rendering.

Chalnoth said:
And if games are using it, why isn't ATI optimizing for this case?
It interferes with getting the maximum bang out of another VBO usage model. Not much, but small performance deltas tend to matter a lot in the driver realm. In a nutshell, you could do "proto-VBOs" closer to the CPU, i.e. VBOs that are cheap to access (and discard) but slower to render from, and you promote them to "real VBOs" once you detect a reuse. You lose out on vertex transfer speed to the chip on the first use if you do so.

VBOs can be used in different ways, and probably ATI didn't anticipate that OpenGL developers at large would pick that way. Then the shit hit the fan before they had time to react.

Dio said it works much better now.
Chalnoth said:
One would expect this to happen every once in a while. But until it happens in games, will anybody care?
Few.
 
Bruce said:
Sounds great. At least this shows that they are still working on the OGL driver performance ;)

I think it has more to do with tweaking the memory controller than the OGL driver itself. There hasn't been any mention of performance increases without AA enabled for example.
 
Actually, he hints at AF performance being improved as well, but more in speculative mode than "we've tested it" mode. Still, there are few better to lean on even in speculative mode re how their parts will react.

I got the sense he was actually a little embarrassed or annoyed. Kind of ubergeek in "Duh!" mode. That kind of thing adds that little extra incentive in making resources available to wring out what there is to wring out on a go-forward basis.
 
geo said:
I got the sense he was actually a little embarrassed or annoyed. Kind of ubergeek in "Duh!" mode. That kind of thing adds that little extra incentive in making resources available to wring out what there is to wring out on a go-forward basis.

Can't blame him, considering these "discoveries" would probably have had a significant impact on those week-old R520 launch reviews, especially the X1600XT which could certainly use the performance boost. At least they'll be able to show better numbers for partner boards launching next month.

Bodes even better for R580/RV560, however, which should benefit from lots of fine tuning over the fall. :)

Edit: Speaking of which, should we expect the 128-bit bus of RV530 to benefit (relatively) even more from these bandwidth-saving memory controller tweaks?
 
Last edited by a moderator:
Well, the silver lining is what they didn't know themselves couldn't leak. . .and there will be another round of XT benchies soon prompted by the NV 512mb board.
 
Mordenkainen said:
So, in an effort to prove ATi's OGL drivers really are good you have to benchmark with a game that's suited towards ATi's hardware?
You're completely twisting my statement with a dumbass's interpretation of propositional logic.

Are there any D3D games that use stencil shadows and Carmack's reverse algorithm? No, so you can't compare it. In any scientific experiment, you have to control your variables. I guarantee you that if Doom3 was written in D3D, there would be a very similar performance deficit. In fact, I have done some stencil work in D3D and found a big speedup on NVidia's hardware. For that game, it has nothing to do with OGL. This new "fix" by ATI only brings the X1K's AA %age drop in-line with D3D games - it won't let ATI beat NVidia.

Chalnoth's example of the UT2003 OpenGL/D3D renderer is a controlled environment. You have the same workload going through different API's. We don't have any data, but that's a good experiment.

What I'm saying is that if you race a black dog and white dog on a track, and then race them again on grass while breaking the white dog's leg, you can't say white dogs suck on grass. In Doom3 and Riddick, ATI is maimed by Hi-Z (that's hierarchical Z; the rest of HyperZ is okay) not working. The deficit has little to do with the API.

The question is why ATI slows down in some other OGL games.
 
I think most sites should spend the day with the AAfix for OGL and redo their benchmarks, suddenly all arguments for bad Ogl on R520 are swept off the table..
 
Mintmaster said:
You're completely twisting my statement with a dumbass's interpretation of propositional logic.

Are there any D3D games that use stencil shadows and Carmack's reverse algorithm? No, so you can't compare it. In any scientific experiment, you have to control your variables. I guarantee you that if Doom3 was written in D3D, there would be a very similar performance deficit. In fact, I have done some stencil work in D3D and found a big speedup on NVidia's hardware. For that game, it has nothing to do with OGL. This new "fix" by ATI only brings the X1K's AA %age drop in-line with D3D games - it won't let ATI beat NVidia.

Chalnoth's example of the UT2003 OpenGL/D3D renderer is a controlled environment. You have the same workload going through different API's. We don't have any data, but that's a good experiment.

What I'm saying is that if you race a black dog and white dog on a track, and then race them again on grass while breaking the white dog's leg, you can't say white dogs suck on grass. In Doom3 and Riddick, ATI is maimed by Hi-Z (that's hierarchical Z; the rest of HyperZ is okay) not working. The deficit has little to do with the API.

The question is why ATI slows down in some other OGL games.
What do you mean it wont let ati beat nvidia?
They are faster in doom 3 now and closer in riddick according to hexus.
 
neliz said:
I think most sites should spend the day with the AAfix for OGL and redo their benchmarks, suddenly all arguments for bad Ogl on R520 are swept off the table..

Well, to be fair, ATI needs to release that public beta of the 5.11 drivers with the fix built into it...then I would expect most sites to revisit some benchmarks.
 
zeckensack said:
People look too hard for generalizations, and while they make life less complex they are not always useful. ATI not being competitive in Doom 3 does not mean that they stink in OpenGL.
I've been saying the same thing for some time. People are confusing ATI's weakness with Carmack algorithm with an OGL weakness, because the two games that use this algorithm are OGL.

However, it's results like IL-2 and Pacific Fighters that are weird and debunk my theory, because they don't have any technical disadvantage, and the deficit scales with higher resolutions, meaning that vertex buffer usage is not the issue. Maybe this AA fix is all they need. For other games, like Lock-On, we can definately see CPU limitations, which your explanation describes well.

If the explanation is indeed something like memory organization or texture compression optimizations, then I think we can blame ATI, because both IHVs do such optimizations in D3D. It is a bit of laziness that ATI doesn't do the same for OGL. However, that is still an 'if' right now.

In any case, I agree that people love jumping to conclusions without examining causes. Sort of like Bush's "they hate us for our freedom" BS. Yeah, of course that's it.
 
The big thing to keep in mind is that just because they found a very very nice optimization for their new memory controller it does not mean that they do not have additional work to do on the rest of their opengl drivers. I very much hope that ATI continues to work on the rewrite of the opengl codebase and additionally applies this new memory management technique to create even better/faster drivers.

Nite_Hawk
 
radeonic2 said:
What do you mean it wont let ati beat nvidia?
They are faster in doom 3 now and closer in riddick according to hexus.
Okay, I should rephrase that.

It won't let ATI beat NVidia by the same margin as they do in D3D. In many D3D titles, The X1800XT is about equal to the 7800GTX without AA/AF, then jumps notably ahead with them enabled. In Doom3, they start well behind without AA/AF, then overtake by a bit with AA/AF.

That relative deficit will still exist for architectural reasons.
 
Mintmaster said:
Okay, I should rephrase that.

It won't let ATI beat NVidia by the same margin as they do in D3D. In many D3D titles, The X1800XT is about equal to the 7800GTX without AA/AF, then jumps notably ahead with them enabled. In Doom3, they start well behind without AA/AF, then overtake by a bit with AA/AF.

That relative deficit will still exist for architectural reasons.
That's much better:)
I wonder what effect this tool will have on the supposed OGL rewrite?
 
Back
Top