Sir Eric Demers on AMD R600

Status
Not open for further replies.
Well, all physical aspects depend on the technology. This includes the elements that make up ASICs (stdcell, memories, macros, etc...). We need to re-characterize and update all libraries and models associated with those. This can mean new synthesis, new P&R rules, new macro designs, etc... This affects everything from the netlist on down, but it can even have ramifications up through the architecture, as you might need to change some functionality, re-pipeline, change your memory usage, for example. It's a lot of work.

Somehow this reminds me of an answer Dave O. gave in a conf call during the summer of 2005 re the R520 delay. Basically he said that the tools had let you guys down, and that you'd do what you could to improve the tool capabilities so that it wouldn't happen again.

Well, I suppose to some degree that was boilerplate and best intentions. Anytime there are tooth marks in your heinie you do what you can to not have that happen again, of course. Just curious tho, was there any specific toolset improvements that flowed from that experience?
 
Somehow this reminds me of an answer Dave O. gave in a conf call during the summer of 2005 re the R520 delay. Basically he said that the tools had let you guys down, and that you'd do what you could to improve the tool capabilities so that it wouldn't happen again.

Well, I suppose to some degree that was boilerplate and best intentions. Anytime there are tooth marks in your heinie you do what you can to not have that happen again, of course. Just curious tho, was there any specific toolset improvements that flowed from that experience?

It wasn't, per say, the tools -- It was a library failure and we did not catch it using the standard flow of the tools. We needed to have better library designs (those aren't internal), better library reviews and a tool flow in place to catch this kind of thing. It must of taken us 5~6 months to find the problem; a real b**tch of a problem...
 
It wasn't, per say, the tools -- It was a library failure and we did not catch it using the standard flow of the tools. We needed to have better library designs (those aren't internal), better library reviews and a tool flow in place to catch this kind of thing. It must of taken us 5~6 months to find the problem; a real b**tch of a problem...

Speaking about better library designs, do you think it would benefit you to move to AMD's tools/libraries/processes later on? Someone i am inclined to believe told me that the in house manufacturing processes used by the CPU manufacturers are substantially better than foundries like TSMC, but i don't know how far this is true.
 
Speaking about better library designs, do you think it would benefit you to move to AMD's tools/libraries/processes later on? Someone i am inclined to believe told me that the in house manufacturing processes used by the CPU manufacturers are substantially better than foundries like TSMC, but i don't know how far this is true.

I won't comment on future projects, but that is now a real possibility.
 
Nope, it's generally simply being late to the game. By then, the channel is full and the OEMs have their deals. Getting deals is simply not possible (most OEMs for mainstream and below) or very difficult (above mainstream, you need to show a huge delta). It's not a competition when you are late. R5xx was fine and price/performance was quite excellent. It was, at least, 4 months late.
Come on, do you really think NVidia's huge margins had nothing to do with them having a much smaller die size for the same or even superior performance? What about 7600 vs. X1600? You came out before NVidia there, but it was way faster and deals were no problem there.

Anyway, no need to respond. You're right in that we should keep it all technical here.

32 total for Z (with or without AA). But HiZ can make that appear larger, as do some of the tile Z operations.
Okay.

What about with colour and Z both enabled? Or is that limited by Z, like previous generations? I was kind of hoping that with a 512-bit bus you'd try to maintain 16 pixels per clock with 4xAA, especially with NVidia pushing 96 samples per clock, but I guess I was just dreaming.

The last ROP question I have is whether you allow AA with 32-bit per channel rendering. It may be excessive for HDR, but it's nice for variance shadow mapping.
 
First off I just want to say thanks sireric for the candid answers given in the Q&A they make for interesting reading.

One question I did want to ask, and might be asking more than you can answer, was over the design choice of doing totally shader based AA with the R600. Clearly this is where AMD/ATi think all DX10 anti-aliasing is going to go and I certainly see the logic of dropping the fixed function hardware support based on that premise. The danger is that nVidia is still supporting it in their hardware and I can see the potential for a nasty code base fight where developers will tune for nVidia hardware and use hardware MSAA to then leverage the 'free' shader resources to do more than an AMD equivalent could.

Given that, running in a non-AA mode, the two architectures seem pretty equivalent (indeed I'd give a slight edge to the 2900XT over the GTX with most benchmarks I've seen) this would seem to put the R600 a bit behind the eight ball when AA, especially straightforward MSAA enters the mix.

My question then is do you now think, with the benefit of hindsight, that perhaps removing the hardware AA resolve was a mistake? Or do you think that most DX10 titles are going to become shader based AA simply for the visual fidelity it can give over MSAA? (As Techland are saying with their Call of Juarez DX10 demo.)
 
I`m quite certain(though Mr. Demers can contradict me....I`d be really happy if he were to, actually:) ) that it`s not the Shader-based resolve holding AA performance back. If you look at the R600s fillrate numbers compared with the G80, the fillrate is simply not there, only at about 8X AA does it equal the GTS(not the GTX).
 
I`m quite certain(though Mr. Demers can contradict me....I`d be really happy if he were to, actually:) ) that it`s not the Shader-based resolve holding AA performance back. If you look at the R600s fillrate numbers compared with the G80, the fillrate is simply not there, only at about 8X AA does it equal the GTS(not the GTX).
fillrate is not particularly important in AA resolve as to produce a single pixel you potentially need to read and average many samples.
 
Which is why I said that it`s not the resolve step holding the R600 back?Or am I misunderstanding you?
 
That's a good question -- Off hand, I don't have the BW and latencies involved in the HT vs. PCIe numbers. Though I would guess that it would benefit from being directly on the HT (lower latency and probably something like ~50% higher BW). It would make hooking up a southbridge harder though.
Would it be possible (or worthwhile?) to include southbridge-functionality on either the GPU or the CPU so that this kind of connection would be feasible?
 
First of all: Thanks to you Eric for answering all those questions! (I somehow feel the urge to use mutiple exclamation marks...)

I've got another question about compression:
We've tested some cases, where AA-compression should really shine (full-screen quad with solid color) and different AA modes. But somehow the fillrates dropped substiantally with added AA-Samples. Any explanation for this? I came to the conclusion that AA-compression must be disabled in the driver, be defective or simply it's very picky about what it compresses and what not.

If necessary, i can provide further details about the tests being used (they were short shaders mostly).
 
I enjoyed reading that a lot, must have taken quite a long time to write all those answers. Appreciated.

It certainly took longer to write than me reading this thread as I skipped pages 3 to 5 so this question might have been answered :-

Not very sexy, but there was talk on release of the 2900XT that the card would be quieter in the future either with better spin up decisions or better quieter coolers, I assume that this is still on the cards ( pardon the pun )?

I have to confess that initial impressions of the 2900XT were not great but consdiering the slowly dropping price, potential increase in performance through drivers, lower sound levels and perhaps good DX10 performanceI do wonder if this card is one of the few slow burners that might be worth purchasing in the future.

I noticed the other day that I had Crysis, Bioshock, Half Life 2 Episode 2 and COD 4 all down for future purchase and coupled with 24 inch monitors finally hoving into view pricewise it might very well fit in with upgrading the gpu at that time.
 
Kind of a low-level detail I was curious about.

Going from the CTM spec, divergent branches require that both paths in the program be executed, with the invalid results masked off.

Does R600 actually execute each instruction and then mask the results, or does it resolve predicated false instructions to NOPs?
 
1/2 nodes are a little easier, as most things can be simply optically shrunk -- You have to design with that in mind, but it can be easier. Famous last words.
ATI seems to have had a miserable time with 110nm (X700XT) and 80nm...

Jawed
 
<snip>
Anyway, no need to respond. You're right in that we should keep it all technical here.

Okay.

What about with colour and Z both enabled? Or is that limited by Z, like previous generations? I was kind of hoping that with a 512-bit bus you'd try to maintain 16 pixels per clock with 4xAA, especially with NVidia pushing 96 samples per clock, but I guess I was just dreaming.

The Z is 32 frags per cycle, regardless of AA mode. That's 2x before in non-AA. I don't disagree that I would of liked to have 4xAA for free, but it's 2xAA for "free".

The last ROP question I have is whether you allow AA with 32-bit per channel rendering. It may be excessive for HDR, but it's nice for variance shadow mapping.

Yes.
 
First of all: Thanks to you Eric for answering all those questions! (I somehow feel the urge to use mutiple exclamation marks...)

I've got another question about compression:
We've tested some cases, where AA-compression should really shine (full-screen quad with solid color) and different AA modes. But somehow the fillrates dropped substiantally with added AA-Samples. Any explanation for this? I came to the conclusion that AA-compression must be disabled in the driver, be defective or simply it's very picky about what it compresses and what not.

If necessary, i can provide further details about the tests being used (they were short shaders mostly).

You should contact devrel -- They should be able to help. It's hard to speculate without seeing exactly what's going on. Our Devrel & Apps guys could analyze the code and find the issue, I'm sure. It could be a path where things are suboptimal -- Not sure.

Note: Make sure you clear the surface first.
 
Last edited by a moderator:
I`m quite certain(though Mr. Demers can contradict me....I`d be really happy if he were to, actually:) ) that it`s not the Shader-based resolve holding AA performance back. If you look at the R600s fillrate numbers compared with the G80, the fillrate is simply not there, only at about 8X AA does it equal the GTS(not the GTX).

The shader AA resolve holds back at very high frame rates (where 1ms makes a difference). Below that, it's less and less of an issue, in general.

But it's correct that G80 has higher fillrates.
 
I enjoyed reading that a lot, must have taken quite a long time to write all those answers. Appreciated.

It certainly took longer to write than me reading this thread as I skipped pages 3 to 5 so this question might have been answered :-

Not very sexy, but there was talk on release of the 2900XT that the card would be quieter in the future either with better spin up decisions or better quieter coolers, I assume that this is still on the cards ( pardon the pun )?

Well, case temperature has a huge effect on the fan speed and, consequently, fan noise. We fixed up the drivers to drop sound level in 2D to pretty quiet, assuming a reasonably cool fan. But I agree that the fan speed is a little more than our X1950XTX. But it's nowhere near that bad either. New boards will have different cooling solutions, but XD2900XT boards are the way they are.

I have to confess that initial impressions of the 2900XT were not great but consdiering the slowly dropping price, potential increase in performance through drivers, lower sound levels and perhaps good DX10 performanceI do wonder if this card is one of the few slow burners that might be worth purchasing in the future.

The price of $399 is a great price, I believe. As to be "slowly dropping", I think that's simply market movement -- Supply and demand. We are supplying a lot of boards every month, but it's new and cool, so people eat up the supply. Wait a little while, and prices should drop further. As far as increases in performance from the drivers, we have shown quite a bit and there is more to come -- But, again, no strick promises can be made here. In terms of DX10 performance, yes, I think our stuff is significantly better. All GS testing shows that we give full shader core resources to GS and it outperforms the competition significantly. As more DX10 titles come out and we tune our new drivers, I expect to shine. But, again, I cannot *promise* that.

I noticed the other day that I had Crysis, Bioshock, Half Life 2 Episode 2 and COD 4 all down for future purchase and coupled with 24 inch monitors finally hoving into view pricewise it might very well fit in with upgrading the gpu at that time.

Yes, SW titles have been a little bit on the light side recently, while the fall promises to be great. I've been having fun with Stalker, but I'm looking forward to all your above titles.
 
Eric, what's the trend for thread size? We have trends for bandwidth and ALU:TEX ratio, etc., and I'm curious what's going to happen with thread size.

R520 set a nice standard with 16 pixels, but since then it's gotten "worse". Are we looking down the barrel of an ever-increasing thread size?

Does it matter much?

Jawed
 
Kind of a low-level detail I was curious about.

Going from the CTM spec, divergent branches require that both paths in the program be executed, with the invalid results masked off.

Does R600 actually execute each instruction and then mask the results, or does it resolve predicated false instructions to NOPs?

It can do either. We can predicate out instructions, or take full seperate branches and mask out. The predication must be specified in the shader otherwise you'll get the full branches.
 
Status
Not open for further replies.
Back
Top