X1800/7800gt AA comparisons

Status
Not open for further replies.
Chalnoth said:
I didn't see it. I see the benchmarks now, and it is an improvement on ATI's side. But since the card still has a rather large memory bandwidth advantage over the 7800 GTX, it still isn't that impressive, and it's still behind the lead that the XT typically has in Direct3D games.

If you read Sirerics posts you'll also see that he promises future improvements, these 30% are just the first step in the right direction and things will improve even furter once app. specific MC profiles start to appear..
 
Oh, I read those posts, but I don't ever take such claims as being indicative of what we'll see when the drivers are actually released.
 
neliz said:
If you read Sirerics posts you'll also see that he promises future improvements, these 30% are just the first step in the right direction and things will improve even furter once app. specific MC profiles start to appear..

well its quite interesting this is only in the AA department and only Ogl. So what we are seeing here is no improvement in base Ogl performance. Have these opts been used in Dx's AA algos in past or are they still being implimented?
 
I'm not sure if it's the AA algo that's being optimized either. just, the MC handling and prioritizing sample patterns?

Maybe they have more space to work in with regard to OGL vs DX?

Anyway it'll be interesting to see what ati AND developers could do with the MC.
 
sireric said:
The MC requires the clients to have lots of latency tolerance so that it can establish a huge number of outstanding requests and pick and chose the best ones to maximize memory bandwidth (massive simplification).
One feature that hasn't got much attention so far is the Color Buffer Cache (buffer and cache?).

Presumably the render back-end has its own scheduler built-in, to take a list of incoming colour/z/stencil values and render them into the back-buffer. I'm guessing that this scheduler will gather the writes into "blocks" and then ask the MC to retrieve the corresponding areas of frame-buffer into CBC, so that the RBE only directly accesses the CBC - it never directly accesses VRAM.

Whilst the RBE is waiting for the MC to deliver the requested block into CBC, it should have received other blocks and be able to perform colour-writes/AA compares etc.

Similarly, presumably, the scheduler also has task types associated with z and stencil queries, again requiring "blocks" of back-buffer to be read into the CBC. Although the diagram for R520 implies that z/stencil (buffer cache) are outside of the RBE - but nevertheless are utilised by RBE.

Finally, of course, the scheduler must deal with purging CBC back into VRAM to make way for other blocks.

Is it right to assume that each of R520's "pixel units" integrates the texture and shader engines with the RBE, so that there are four separate RBE's in X1800XT? Each with a localised CBC?

It seems to me that if the GPU splits the back-buffer into "screen tiles", e.g. of 16x16 pixels, then each RBE has guaranteed ownership of "blocks" in the back-buffer - so avoiding any risk of contention by multiple RBEs over individual pixels in the back-buffer.

The only remaining problem is to ensure that colour-write operations that are dependent on write order are processed in write order - so the scheduler needs to be able to differentiate between un-ordered writes and in-order writes, when it schedules RBE tasks.

Jawed
 
Razor1 said:
well its quite interesting this is only in the AA department and only Ogl. So what we are seeing here is no improvement in base Ogl performance. Have these opts been used in Dx's AA algos in past or are they still being implimented?
These ops ar for OpenGL and AA. Probably because there they had mem controller adressable inefficiencies.
 
Last edited by a moderator:
Hubert said:
These ops ar for OpenGL and AA. Probably because there they had mem controller adressable inefficiencies.

well the performance boost varies depending on application, which seems to stick out with D3 for the most part. Why didn't the shader replacement get them closer to g70's in this case?
 
BRiT said:
Once again, 3dfx had that first.


Not only that, of course, but nVidia's official public position on FSAA when 3dfx launched it was that "gamers don't want FSAA; gamers want high-resolution gaming," as if the two were mutually exclusive...;) nVidia's first copy-cat FSAA attempts were so poor compared to 3dfx's efforts that it was true--FSAA did suck royally--but only on nVidia hardware...;)
 
I think Razor1 means the Humus tweak to replace a texture lookup with math ops to improve performance.

And the reason that alone isn't good enough to take that class of hardware higher than the equivalent NVIDIA board should be glaringly obvious and has been covered here dozens of times in the past, to death.
 
Last edited by a moderator:
Chalnoth said:
Except these two things do not follow. Firstly, I really don't see how you can quantify ATI as doing more "forward thinking." It was, afterall, nVidia was the first one to implement a large number of the technologies that we take for granted in 3D graphics now, including anisotropic filtering, FSAA, MSAA, programmable shaders, and hardware geometry processing.

You've already been corrected on your sadly revisionist FSAA views so that doesn't bear further comment. I'd also like to point out that prior to nV40 the only thinking nVidia was doing was backwards, as embodied most dramatically by nV30 and all of the tedious public commentary we had to listen to from nVidia as to why "DX9 is not the future of 3d gaming." I've said it before and think it bears repeating: if not for R3x0 we'd never have seen nV4x from nVidia, let alone anything since. R3x0 was the "catalyst" which gave nVidia direction--no doubt about it...;)
 
WaltC said:
Not only that, of course, but nVidia's official public position on FSAA when 3dfx launched it was that "gamers don't want FSAA; gamers want high-resolution gaming," as if the two were mutually exclusive...;)
At that time they were, kind of. Supersampling is costly.
But if you're not referring to the Quantum 3D drivers no-X mentioned, NVidia had FSAA before the V5 was available with its superior RGSS.


BRiT, 3dfx never had MSAA in a released product.
 
Last edited by a moderator:
Rys said:
I think Razor1 means the Humus tweak to replace math ops with something else to improve performance.

And the reason that alone isn't good enough to take that class of hardware higher than the equivalent NVIDIA board should be glaringly obvious and has been covered here dozens of times in the past, to death.

I understand that but still its like something is really holding back the r520'sl, even that didn't help it get much closer. Before these drivers, the x1800xt was performing around the level of the 6800 nu/6800gt wasn't it? Thats just really poor, seeing that the x850xt was on par or just under the 6800 ultra with the tweak.
 
WaltC said:
You've already been corrected on your sadly revisionist FSAA views so that doesn't bear further comment. I'd also like to point out that prior to nV40 the only thinking nVidia was doing was backwards, as embodied most dramatically by nV30 and all of the tedious public commentary we had to listen to from nVidia as to why "DX9 is not the future of 3d gaming." I've said it before and think it bears repeating: if not for R3x0 we'd never have seen nV4x from nVidia, let alone anything since. R3x0 was the "catalyst" which gave nVidia direction--no doubt about it...;)
I bet good money that you can't avoid writing about nVidia in any way for the next 50 constructive posts you make on these forums. It's all you talk about, bringing the same stuff up over and over again. A thousand posts making the same tired old points that we've all read a thousand times, that we fully get and understand, but which you persist in bringing up time after time.

Yet another thread derailed by ATI vs NVIDIA chatter, eventually dropping to the depths of days gone by and more derivative and highly boring posts by you about NV30 being shit at DX9, rather than any meaningful discussion on technology of the present and what's going on with current hardware. The topic says X1800 and 7800, not GeForce FX 5800 and two year old stale pissings which you must bring up again and again.

If you must continue on your merry NV30 and NVIDIA-hating way, do it in 3D Graphics Companies and Industry, if anywhere, and keep it out of 3D Technology and Hardware. This forum section is not somewhere for you to post stuff like that.

But really, please just stop it entirely.
 
Rys said:
I think Razor1 means the Humus tweak to replace math ops with something else to improve performance.

And the reason that alone isn't good enough to take that class of hardware higher than the equivalent NVIDIA board should be glaringly obvious and has been covered here dozens of times in the past, to death.
Actually it was replacing a texture op with math ops. The texture op was very expensive with anisotropic filtering enabled on ATI hardware.
 
Guys, We've had the unique opportunity to correspond in great detail with one of ATI's brightest minds... Please do not turn this into a pissing match over who invented what technology, or who had the better antialiasing methods back when 3dfx was around. There are thousands of threads you guys can argue about who implemented what features. Hell, you can make a new one. Don't ruin mine.

Jawed:

I'm still trying to process everything in your post... Why exactly is the color buffer cache needed? Couldn't the RBE directly request blocks from the MC and remove the CBC layer? I think the missing peice of information for me is how often the same data gets requested from the CBC over again. If we have 4 RBEs, do each of them only contact one CBC? If so I could see why this is important...

Thanks,
Nite_Hawk
 
  • Like
Reactions: Geo
Wow excellent thread! Its great to read so much of low level info about the gpus straight from the guys involved in making it.

sireric said:
I'm not saying replace the gfx APIs -- Just trying to limit to prolification of new ones. What if the physics API doesn't allow for all physical phenomena to be done? Do you create a new API for that? What if signal processing wants to be done and you only have collision hooks?

At the end, I fear the same thing regarding low level of detail. But I fear the extreme work in having lots of new specialized APIs too. I'd like a reasonably low level API that allows more "to the metal" performance, but that abstracts some of the quirks of programming a given architecture. I don't really know the answer either. It's a new place were we are continuing to explore, but we are listening and talking to that community.

A low level API would be really great to use instead of abusing the graphics api for non-graphics tasks. Right now the GPUs are evolving at a great pace so would it not be reasonable to not enforce backward compatibility of thelow level API. So the GPU architecture is free to evolve in whatever way it wants, each time supplying a changing API. If the guys doing non-graphics things want their code to run unmodified on the new architecture too then they should only stick to abusing the graphics API, possibly trading off some performance and features. If more APIs are to be provided at all, they should be implemented as libraries on top of the low level APIs.

But I am very happy just to learn that you are atleast thinking of doing something about it.
 
Status
Not open for further replies.
Back
Top