AMD: R7xx Speculation

Status
Not open for further replies.
No, in the initial R600 reviews, turning on AA caused a massive performance drop. I thought everybody on B3d would know that..

For example sometimes it would compete well with 8800GTX..but then turn the AA/AF on and it was toast, where the GTX endured a tiny drop. And face it, most sites are going to bench high end card with AA/AF and rightfully so.

It seemed like the 3870, probably because of drivers, maybe it was just drivers, got a lot better about this..

I still dont see why you'd want to sap shader power for AA under any conditions though..

Edit: Oh I get it, you're claiming it was AF rather than AA that caused the hit..I do remember something like that, but I dont accept that it was decided that AF was the problem that I can recall for sure..

There are a number of factors at play. AA did get considerably better compared to how it was when the R600 was released in terms of performance(drivers helped significantly). You must consider the fact that the R6xx architecture is limited to 2 AA samples per clock as opposed to the G8x that does 4. Consider it's underpowered when it comes to texture filtering when compared to the G8x, and that most tests have 16X AF included as well.
 
I started doubting that theory when I saw many benchmarks in which R600 pulled ahead of the 8800 GTS and put it on par with the 8800 GTX, once higher levels of AA were enabled (e.g. 8 samples).
I think that's due to the higher initial cost you pay for shader-based resolve. I picture this as shader-resolve having a high latency (per Pixel), but also a high throughput (in terms of # of samples).

This should obviously improve as you increase the number of shaders. But OTOH given that Arun's estimated 6.something percent of die space for nvidias ROPs, i wonder whether it's better not to also have (some capable) of those.
 
I think that's due to the higher initial cost you pay for shader-based resolve. I picture this as shader-resolve having a high latency (per Pixel), but also a high throughput (in terms of # of samples).

This should obviously improve as you increase the number of shaders. But OTOH given that Arun's estimated 6.something percent of die space for nvidias ROPs, i wonder whether it's better not to also have (some capable) of those.
Ah, that makes a lot of sense.

Does anyone know the basics of just how NVIDIA can employ such a low amount of die space to ROP's compared to ATI's method? I would imagine that NVIDIA's method of coupling ROP's to the memory controller would have an influence...
 
It was a feature.
There is also nothing wrong with it, it should actually prove to be very beneficial down the road.

Heard that often already, but many things never made it down the road ;)

I personally still think it was defective, whatever they say. Noone in their sane mind would have wanted a slower AA implementation in a new chip.
 
Heard that often already, but many things never made it down the road ;)

I personally still think it was defective, whatever they say. Noone in their sane mind would have wanted a slower AA implementation in a new chip.
Its required for DX10.1, no?
I wonder if NV choosed to stay 10.0 partially because of this - ie they keep it simpler for them and higher speed, lack 10.1, but as they dominate the market they can easily make it look like another ps1.4 vs ps1.3 till DX11
 
Edit: Oh I get it, you're claiming it was AF rather than AA that caused the hit..I do remember something like that, but I dont accept that it was decided that AF was the problem that I can recall for sure..

It (almost) is. You missed out on all the discussion with the ComputerBase graphs and all. ;)
 
Something about MSAA resolve:
http://forum.beyond3d.com/showthread.php?p=1005255#post1005255
Rys said:
Hardware resolve is actually done in the ROP on R600, but only for fully compressed tiles. I write that in the article. So I don't need to state that it was the plan to use the ROP for downsampling, because that's actually what's happening (unless you argue that reading just one value doesn't count, because there was no math involved to weight other samples) for one case.

And I also say that I lean towards the case that the hardware is broken because they have to downsample on the shader core for non fully compressed tiles, even if they can pass the decompressed samples back with a fast path.

So Damien and I say pretty much the exact same thing, just with different language.
 
Its required for DX10.1, no?
Per-Sample Access to the Depth-buffer from the shader-core? Yes.
I wonder if NV choosed to stay 10.0 partially because of this - ie they keep it simpler for them and higher speed, lack 10.1, but as they dominate the market they can easily make it look like another ps1.4 vs ps1.3 till DX11
I think, that's actually one of the things they can do with current Geforces - just look at MSAA being possible in Unreal-Engine 3 (and in the DX10-version of Bioshock).

Though obviously, since MS is very strict about DX10-family (and rightfully so!), they cannot claim DX10.1 if there's something else amiss.
 
I think, that's actually one of the things they can do with current Geforces - just look at MSAA being possible in Unreal-Engine 3 (and in the DX10-version of Bioshock).
DX10.1 isn't required to get AA working in UE3, as long as you don't mind some artifacts.
 
So my assumption that per-sample-access to a depth-buffer might be used for that is also incorrect?

MS is pretty strict with DX10.0 and up. If you don't support all of the required features of a version of DX10+ then no features of that version of DX10 will be enabled or useable by your card. Some view this as MS being overly restrictive. I view this as MS trying to make things easier for customers that just want a video card that works.

Nvidia certainly can enable shader based AA (it's a DX10.0 optional feature I believe), however the hit they take for it is greater than the hit taken by ATI hardware. I believe there was much discussion about this around the time the DX10 CoJ demo was released which disabled fixed function AA. Nvidia wasn't too happy about that.

And as far as I know, UE3 doesn't currently have a DX10.1 path. And all current methods of AA (both NV and ATI) are hacks with resulting artifacts. Then again, while I don't have nearly the time I wish to keep up with this forum. I have even less time to track current developments with UE3. :p

Regards,
SB
 
Silent_Buddha,
Yes, I am aware of that. But neither MS nor DX10 prevent you from using parts of an API through a hack. So, even though a game technically doesn't support DX10.1 would not mean - AFAIK - that they cannot use this.

But my question was whether or not both UE3-hacks do in fact use this per-sample-access to the depth buffer to enable AA.
 
Just for kicks & since I'm currently reading a good book, I have just done a quick bunch of 3Dmark06 runs on my 3870 with various AA/AF configs.
E6600 @stock, 3870 @stock, Cat 8.5

no AF/AA 9883
16* AF/8*AA 8098
8* AF/no AA 8552
16* AF/no AA 8164
no AF/4*AA 9456
no AF/8*AA 9449
no AF/24*AA 8903

This was just one run of each config & I have stuff open in the background so not by any means scientific but there seems to be a pretty clear pattern.

Edit: 3670@stock -> 3870@stock :eek:
 
Last edited by a moderator:
Be my guest, find a review where RV670 is slower than R580 with 4xMSAA. I can't find one.

Jawed

Since you don't specifiy: Does that include HD3850 vs. X1950 XTX and a setting with aniso also applied, not only 4x MSAA?
 
Status
Not open for further replies.
Back
Top