AMD: R7xx Speculation

Status
Not open for further replies.
Only DX10.1 cards it seems. These results don't surprise me too much. Maybe a little bit larger than expected, but nothing unbelievable. DX10.1 adds access to multisampled depth buffers, and I think that this is either the entire performance increase or at least the majority of the performance increase. If you're going to combine antialiasing with access to the depth buffer you need to do have a separate render target on DX10 to store your depth values into, which multiplies the bandwidth needs several times for the pre-z pass (keep in mind that the regular depth buffer can utilize z-compression, whereas a color render target cannot).
Seems this was even more of a performance benefit than I suspected:

http://forum.beyond3d.com/showthread.php?t=45548

:D

Jawed
 
...How difficult is it to modify a DX10 game for DX 10.1?

That is the million dollar question IMO. Will ATI send people out to other game developers and promote DX10.1 use or will they show case it in just one game hoping to lure developers in?

Source

DX10.1 brings the ability to read those sub-samples to the party via MSBRW. To the end user, this means that once DX10.1 hits, you can click the AA button on your shiny new game and have it actually do something. This is hugely important.

The first reaction most people have is that if a game is written for DX10, then new 10.1 features won't do anything, AA awareness needs to be coded in the engine. That would be correct, but we are told it is quite patchable, IE you will probably see upgrades like the famous 'Chuck patch' for Oblivion. Nothing is guaranteed, but there is a very good chance that most engines will have an upgrade available.

If they get people out there to promote DX10.1 in other games before the release of R770V and R700 that could be the 1 2 punch ATI was looking for IMO.

Side note:
Wait a minute, is DX10.1 something new or was it always part of DX10? If it was always part of DX10 (DX10 minus .1) was it delayed do Nvidia not being able to properly allow virtualization of memory?
Source

Or is there another reason?
 
Last edited by a moderator:
How would it be but app specific?You need to have the 10.1 shader(s) taking advantage of the full acces the new DX provides-if you've got only DX9/10 code,how would that happen?

Yes, but it's not a requirement to be able to sample it as a texture. That's the feature DX10.1 brings to the table, and the app needs to write DX10.1 code to use it.

Yeah, my bad I was off track there. At first I thought the gains were only with AA - hence my confusion as to why it would be app specifc.

Though there's still something that I'm not clear on when it comes to plain vanilla AA in DX10 on R6xx. Does it use the workaround or sample the depth buffer directly? If the hardware functionality is there can't the driver use it no matter what API it's running given that AA is a bit of a black box?
 
Yeah, my bad I was off track there. At first I thought the gains were only with AA - hence my confusion as to why it would be app specifc.

Though there's still something that I'm not clear on when it comes to plain vanilla AA in DX10 on R6xx. Does it use the workaround or sample the depth buffer directly? If the hardware functionality is there can't the driver use it no matter what API it's running given that AA is a bit of a black box?

The R600 can't allow direct access to the depth buffes-it's a DX10 only part,so it's either doing the workaround Humus suggested or something else. This is highlighted by the fact that it does not benefit from using SP1/10.1(the differences there are within the margin of error,considering i used fraps),and that it has inferior AA quality(just like the X2 does when 10.1 isn't installed)
 
Is AA in R600 actually implemented as a Direct3D shader? I still don't get it....ATI is in full control of the hardware so why not just run a custom AA shader that uses the hardware functionality and has nothing to do with the DirectX API? This isn't something that has to be exposed to developers via the API...it's something the driver is doing behind the scenes.

I'm not really talking about allowing access to anything.....just what happens when you turn on regular old AA in the control panel for any regular old game. Why does the driver/hardware have to jump through hoops there just because the API doesn't expose the functionality? The way I see it...this is analagous to transparency AA or CSAA - stuff that the hardware / driver can do behind the scenes without the app or API explicitly requesting it.
 
Well, there's always the possibility that R600 "could" support those specific features in DX 10.1 that allow for increased speed.

However, Microsoft's stance (and I mostly agree with it) is that if you don't support ALL features then you won't be certified or enabled to support ANY features.

So it's possible it's been there all the time with R600, however it will never get to use it as it requires DX10.1 and R600 doesn't support ALL DX10.1 features. And thus can never take advantage of this.

As someone else said, it's a shame that some features in DX10.1 were originally slated for DX10.0 but were in the end delayed. And it appears that ATI suffered more from this than did Nvidia.

Regards,
SB
 
Is AA in R600 actually implemented as a Direct3D shader? I still don't get it....ATI is in full control of the hardware so why not just run a custom AA shader that uses the hardware functionality and has nothing to do with the DirectX API? This isn't something that has to be exposed to developers via the API...it's something the driver is doing behind the scenes.

I'm not really talking about allowing access to anything.....just what happens when you turn on regular old AA in the control panel for any regular old game. Why does the driver/hardware have to jump through hoops there just because the API doesn't expose the functionality? The way I see it...this is analagous to transparency AA or CSAA - stuff that the hardware / driver can do behind the scenes without the app or API explicitly requesting it.

I'm not sure I get your question. How do you think this works?

In the case of AC, it's very probable that UBi is doing at least resolve and probably more through their own shaders, in order to get HDR-correct AA, AA that plays nice with other effects like DOF etc.(they don't detail what they're doing anywhere). One/some of these shaders, when switched to 10.1 provides the performance improvement noticed, probably due to the reasons Humus suggested. Forced AA doesn't even work in AC, btw.

The 2900 doesn't even have the HW capability to implement the features of 10.1 that are probably taken advantage of here, it's strictly SM4.0. Or am I getting you wrong?
 
Is AA in R600 actually implemented as a Direct3D shader?
I doubt it, if you're talking about the general case of resolving an MSAA'd render target.

I still don't get it....ATI is in full control of the hardware so why not just run a custom AA shader that uses the hardware functionality and has nothing to do with the DirectX API? This isn't something that has to be exposed to developers via the API...it's something the driver is doing behind the scenes.

I'm not really talking about allowing access to anything.....just what happens when you turn on regular old AA in the control panel for any regular old game. Why does the driver/hardware have to jump through hoops there just because the API doesn't expose the functionality? The way I see it...this is analagous to transparency AA or CSAA - stuff that the hardware / driver can do behind the scenes without the app or API explicitly requesting it.
In AC, which seems to use deferred rendering, it seems there are two versions of the D3D10.x code:
  1. 10.0 - creates a set of render targets (G-buffer), which include Z data (relatively slow because of lack of Z compression). In the tone-mapping+AA-resolve pass the Z data is lower quality per sample (within each pixel the Z gradient across samples belonging to each triangle isn't available)
  2. 10.1 - creates a set of render targets, but there is no need to explicitly include Z in these as this is automatic (saving a pile of bandwidth). The tone-mapping+AA-resolve pass has full quality Z data, as though a conventional forward renderer had been used, not a deferred renderer.
Note that both versions of the code use multiple colour samples per pixel. The difference centres on the quality of the Z data recorded per pixel and the bandwidth overhead incurred in D3D10 because Z compression isn't as good. Z is actually written twice per pixel during G-buffer creation :oops: once so that the GPU can determine depth for visibility of each new pixel (uses Z compression) and again for the deferred rendering algorithm to consume (uses MSAA's colour compression for Z data, as the G-buffer pretends that Z is a colour).

So the problem for R600 is that the application is explicitly coded to use a different kind of G-buffer (with extra Z data) - for the driver to "force" R600 to work like RV670 it would have to intercede in both the creation of the G buffer and in the tonemap+resolve shader.

For what it's worth I suspect R600 could be made to do this. But the level of driver interference is much higher than it would be for HDR+AA (R5xx) or UE3+AA scenarios. Also, ahem, that doesn't sell HD3xxx...

See that thread I linked for the slides and discussion over the performance and quality issues that separate D3D10 and D3D10.1 when performing deferred rendering MSAA.

Jawed
 
I doubt it, if you're talking about the general case of resolving an MSAA'd render target.


In AC, which seems to use deferred rendering, it seems there are two versions of the D3D10.x code:
  1. 10.0 - creates a set of render targets (G-buffer), which include Z data (relatively slow because of lack of Z compression). In the tone-mapping+AA-resolve pass the Z data is lower quality per sample (within each pixel the Z gradient across samples belonging to each triangle isn't available)
  2. 10.1 - creates a set of render targets, but there is no need to explicitly include Z in these as this is automatic (saving a pile of bandwidth). The tone-mapping+AA-resolve pass has full quality Z data, as though a conventional forward renderer had been used, not a deferred renderer.
Note that both versions of the code use multiple colour samples per pixel. The difference centres on the quality of the Z data recorded per pixel and the bandwidth overhead incurred in D3D10 because Z compression isn't as good. Z is actually written twice per pixel during G-buffer creation :oops: once so that the GPU can determine depth for visibility of each new pixel (uses Z compression) and again for the deferred rendering algorithm to consume (uses MSAA's colour compression for Z data, as the G-buffer pretends that Z is a colour).

So the problem for R600 is that the application is explicitly coded to use a different kind of G-buffer (with extra Z data) - for the driver to "force" R600 to work like RV670 it would have to intercede in both the creation of the G buffer and in the tonemap+resolve shader.

For what it's worth I suspect R600 could be made to do this. But the level of driver interference is much higher than it would be for HDR+AA (R5xx) or UE3+AA scenarios. Also, ahem, that doesn't sell HD3xxx...

See that thread I linked for the slides and discussion over the performance and quality issues that separate D3D10 and D3D10.1 when performing deferred rendering MSAA.

Jawed

That is along the lines of what I thought as well(there are also some noises suggesting it might be based on an evolved version of the GRAW engine). There's a minor point of contention though, as AC offers AA through DX9 as well, which is something that shouldn't be possible with an entirely deferred renderer.

I think that they're only using a deferred approach to shadowing(something along the lines of what UE3 does), as there's a significant delta in shadow quality between DX10/10.1 and DX9, so it might be that in DX9 they're using a simplified, non- deferred shadowing model. It is an interesting beast though.
 
That is along the lines of what I thought as well(there are also some noises suggesting it might be based on an evolved version of the GRAW engine). There's a minor point of contention though, as AC offers AA through DX9 as well, which is something that shouldn't be possible with an entirely deferred renderer.

I think that they're only using a deferred approach to shadowing(something along the lines of what UE3 does), as there's a significant delta in shadow quality between DX10/10.1 and DX9, so it might be that in DX9 they're using a simplified, non- deferred shadowing model. It is an interesting beast though.
GRAW is fully deferred I think, though I'm not sure about GRAW2. Is the AA in the DX9 version of AC MSAA or something else (like the selective blurring of Stalker)?

If the performance benefit comes solely from shadowing then that's a pretty big deal.

Jawed
 
GRAW is fully deferred I think, though I'm not sure about GRAW2. Is the AA in the DX9 version of AC MSAA or something else (like the selective blurring of Stalker)?

If the performance benefit comes solely from shadowing then that's a pretty big deal.

Jawed

No, it's MSAA. It's quality is equal to what you see with DX10, but inferior to what you get with 10.1. Albeit it might be more than shadows, it's just the shadows that are the most visible, I Haven't played with DX9 enough yet.
 
No, it's MSAA. It's quality is equal to what you see with DX10, but inferior to what you get with 10.1.
Hmm, well just because DX9 and 10 have the same quality of AA (for everything other than shadows, I presume that's what you're saying) doesn't actually mean they are both MSAA.

Hopefully someone who knows what's going on will post some detailed information.

Jawed
 
Hmm, well just because DX9 and 10 have the same quality of AA (for everything other than shadows, I presume that's what you're saying) doesn't actually mean they are both MSAA.

Hopefully someone who knows what's going on will post some detailed information.

Jawed

It's not per-edge blur as with Stalker, nor is it SSAA(performance hit not consistent with what SSAA would mean). Shadows are rendered differently in DX9 vs DX10/10.1. Poke around the exe and you'll see a bit of what they're doing;).
 
One/some of these shaders, when switched to 10.1 provides the performance improvement noticed, probably due to the reasons Humus suggested. Forced AA doesn't even work in AC, btw.

The 2900 doesn't even have the HW capability to implement the features of 10.1 that are probably taken advantage of here, it's strictly SM4.0. Or am I getting you wrong?

Yeah I'm not asking about custom AA or depth buffer access by the application. I know that's the case with AC but I'm asking a more general question. I'm asking about what happens when you go into CCC and force AA on RV670 for a game like Doom3 or HL2....an older game that just does regular AA. What's stopping the driver from using the hardware's capability to access the multi-sampled depth buffer directly?

It's clear that DX10.1 accelerates application access to the depth buffer. Humus answered that a few posts back. But I was curious whether the same functionality is already used behind the scenes for regular forced AA. It would help explain why RV670 is competitive with R600 even with much less bandwidth.
 
At least that'd be a very clever way of utilizing one's HW ressources. If they really do that - cudos to the folks at AMD.
 
So it's possible it's been there all the time with R600, however it will never get to use it as it requires DX10.1 and R600 doesn't support ALL DX10.1 features. And thus can never take advantage of this.
I'm wondering, what features of DX10.1 could R600 actually support (and which not?)
 
Yeah I'm not asking about custom AA or depth buffer access by the application. I know that's the case with AC but I'm asking a more general question. I'm asking about what happens when you go into CCC and force AA on RV670 for a game like Doom3 or HL2....an older game that just does regular AA. What's stopping the driver from using the hardware's capability to access the multi-sampled depth buffer directly?

Nothing's stopping the driver from using such functionality, but it doesn't help either, because you don't resolve the depth buffer for antialiasing. Only the color buffer is resolved.
 
I'm wondering, what features of DX10.1 could R600 actually support (and which not?)

Well, I haven't been keeping up with things lately. Things have been a bit busy around here. But I believe Tesselation for example was originally scheduled for DX 10.0. Not sure if it even made it into 10.1. Like I said I haven't had the time to really keep up with things this past year. So many games sitting on my shelf un-installed. :(

I remember there was noise around the release of DX 10 and the R600 about features that didn't make it into 10.0 that R600 could have supported. Not sure if that was rumor, smoke and mirrors, or just wishful thinking.

Regards,
SB
 
Semi-related to this, the performance gains in Assassin's Creed from DX10.1 were mostly due the fact that they could remove one pass when rendering the post processing effects

The DX10.1 support, however, will be removed in the next patch but probably will be reimplemented later

”In addition to addressing reported glitches, the patch will remove support for DX10.1, since we need to rework its implementation. The performance gains seen by players who are currently playing Assassin’s Creed with a DX10.1 graphics card are in large part due to the fact that our implementation removes a render pass during post-effect which is costly.”

http://forums.ubi.com/eve/forums/a/tpc/f/5251069024/m/6571038256
 
Status
Not open for further replies.
Back
Top