Sir Eric Demers on AMD R600

Status
Not open for further replies.
I personally think that was the biggest mistake of R5xx. The ultra-threaded nature required so much more die-space compared to NV4x/G7x that it couldn't really compete in the other price categories. It's great for a developer that wants to mess with dynamic branching, but it went 95% wasted for the consumer because few if any games took advantage of it.

I disagree. The main issue was being late...

[/QUOTE]
 
Yeah, i want to see some nekkid die shots too. Top level clock trees aren't too interesting :cry:


@Sireric
Are any parts of current GPUs fully custom designed (as in, all the way down to hand layouts)? If not, what is the lowest level where custom design work done, and how much of it is done? Or is it all just behavioral hdl code run through a compiler/synthesizer, etc? Do you see future GPUs using more custom design work for important parts such as ALUs?


Yes. We certainly work at the transistor level -- All pads, some of the internal elements are custom. But most of the design is still designed in Verilog and goes through a synthesizes / place and route system. More custom is certainly possible; it gives better speed and work/mm^2, but it comes at the cost of needing a huge amount of work before being ready. Also, there are lots of level of what people call "custom". Some are easier than others.
 
My own particular back door needed serious overhaul before I looked at this, and yes it needed a refresh, but make sure it has value first. Don't overhaul dead plants and kill the fruit-giving trees.
 
I disagree. The main issue was being late...
That's only because you targetted 90nm, and I wasn't referring to R520 anyway. I meant competitiveness when NVidia went 90nm and was on equal footing. Fortunately they decided to reap profits from crazy margins instead of making ATI bleed.

Look, I'm not a big fan of NV4x/G7x or anything. If anything I think they copped out of the most important part of SM3.0. If you added FP blending to R4xx, they would have been exposed as such.

However, you can't deny that R5xx/RV5xx are a perfect example of what you were talking about with features taking die space and going unused. RV5xx has the same transistor count as the X800XT, and the latter was much faster in games, most shader tests, and even GPGPU from what I've seen.

Anyway, I'm looking forward to you showing us what R600 can do.

---------------------------

Back to R600: how many samples does the ROP process per clock when doing AA? Did R580 do 4 per clock per ROP (i.e. 64 total) or 2? Some of the B3D test results were a bit odd.
 
It's very cool that sireric (and ATI by extension) is so forthcoming. I had no idea he was lead on R600 and still finds time to hang out in the forums with us plebs :)

But some of those answers are really puzzling to me. The implication is that R600's design and hardware attributes are fantastic and the drivers are poo-pooing all over performance. Aren't the two inextricably linked? A lot of what is being said seems to have no basis in benchmark results we've seen so far.

I did not mean to imply that. It's a good and reasonable design. The drivers are very stable and deliver the expected performance in many cases. There are certainly quite a few left to address. If someone is expecting a 2x boost in all apps, well, I'm sorry, but it won't happen. There's a couple of places where that kind of delta is possible, and a lot where 10 ~30% is possible too. But we are shipping a good product at an awesome price. People should buy based on current expectation (or near current), with the possibility of future improvements. From a functional aspect, there's more to do too (such as more custom filters, fixing some of the video stuff, etc... -- We have many features coming down the pipe). But we did ship a good product.

The following answer was especially curious as a response to "Does it make their architecture more balanced for that target market, or just less future-proof?"

"I’m not sure about what the competition used or the restrictions in their architecture that could force sub-optimal solution."

Where has R6xx demonstrated that its ALU:TEX ratios are more optimal than the competition's? Or am I misreading what is being said here?

In yesterday's games, and even a lot of today's game, the ALU:TEX ratio selected is probably overkill. But when I look at some of the newest games, or some coming down the pipe, or even some of our demoes, a ratio of 4:1 seems pale -- Shaders are coming with 10:1, 20:1 ratios -- Even with the most advance filters, those apps are still ALU bound. That is more where we shot for.
 
That's only because you targetted 90nm, and I wasn't referring to R520 anyway. I meant competitiveness when NVidia went 90nm and was on equal footing. Fortunately they decided to reap profits from crazy margins instead of making ATI bleed.

Look, I'm not a big fan of NV4x/G7x or anything. If anything I think they copped out of the most important part of SM3.0. If you added FP blending to R4xx, they would have been exposed as such.

However, you can't deny that R5xx/RV5xx are a perfect example of what you were talking about with features taking die space and going unused. RV5xx has the same transistor count as the X800XT, and the latter was much faster in games, most shader tests, and even GPGPU from what I've seen.

Anyway, I'm looking forward to you showing us what R600 can do.

Nope, it's generally simply being late to the game. By then, the channel is full and the OEMs have their deals. Getting deals is simply not possible (most OEMs for mainstream and below) or very difficult (above mainstream, you need to show a huge delta). It's not a competition when you are late. R5xx was fine and price/performance was quite excellent. It was, at least, 4 months late.

---------------------------

Back to R600: how many samples does the ROP process per clock when doing AA? Did R580 do 4 per clock per ROP (i.e. 64 total) or 2? Some of the B3D test results were a bit odd.

32 total for Z (with or without AA). But HiZ can make that appear larger, as do some of the tile Z operations.
 
haha, I'm pretty sure every egineer has something he/she regrets on a project. But is their something specifically in R600 that you would have like to improve the most?

Actually, yes, there is a lot. As I said, I've never had a project without regrets. Making those regrets public would only tarnish things without adding value.
 
Ok, nice interview with interesting comments especially the one about the 512bit memory interface. This is clearly one advantage of the R600 compared to the competition but my question is, where will all this bandwidth be used for? Before the release, specualation was that there was some new form of AA or IQ increasing feature that would use all this bw. Was it a checkbox feature? or is there something in the future that we can expect from AMD/ATi that is going to surprise us?

Moving onto a more serious manner, picking up my two round objects packed in a sack in the process i want to ask you on a subject thats so delicate that ive seen threads burn to smithereens.

Now, im not sure if this is going to get answered or not, but i cant resist in asking directly to the lead of the R600 project on this one. (and thats you Sir Eric)

When Lindholm and his team produced what we now know as G80, what was your initial impressions? what was the reactions from you guys? what is one of those ".. congrats to the other camp.." kind of reaction, or all hell broke loose on yourside of the camp? :D To be honest, what did ATi expect from nVIDIA? did you guys have had any idea that G80 would bea unified shader architecture?

Looking at the benchmark, the 2900XT is very competitive to the 8800GTS 640mb often winning mostly than losing. However, everybody on this board knows that the R600 wasn't aimed at the GTS, but rather the full fledged G80 i.e 8800GTX/Ultra. The R600 doesn't consistently outperform the last gen by x2 the performance. The interview makes R600 all nice and dandy, but real life concrete benchmarks/performance quite don't illustrate it.

Maybe ill stop there, but hope it gets answered.

Although its not flawless, well done to you guys! competition is great.
 
Yes.

It only has 16 texture filtering units.

I'd think turning on AF will damage the architecture far more than turning on AA. A lot of people seem to think there's something wrong with the AA of R600, when actually it's just that review sites turn on AF and AA together, and the performance hit comes from the AF.
Check these numbers. AF performance drop is similar to R580's, but MSAA performance drop is much higher. I think most people awaited similar AF drops, but nobody expected that MSAA will hurt performance so much (compared to R580: 2x bus width, 2x MSAA samples/clock, 2x (not sure) better compression..., but worse results )
 
Check these numbers. AF performance drop is similar to R580's, but MSAA performance drop is much higher. I think most people awaited similar AF drops, but nobody expected that MSAA will hurt performance so much (compared to R580: 2x bus width, 2x MSAA samples/clock, 2x (not sure) better compression..., but worse results )

Yes, that is really strange because of the factors you mentioned, but maybe the shader AA resolve is hitting more than expected, or there are also a lot of optimizations with memory controller to do (I remember when on R520/580 there were good jumps in performance in OpenGL in Doom3 and Quake4 by only tuning the MC, at least by ATI's words)
 
As for the die shot:
AARRGGHH!!
They had a picture of the metal layer?
The problem, of course, is that you can't take a picture of stuff below metal without first removing it... Which means you'd have to scrape away those metal layers first. That's perfectly doable with a bunch of tools and acids, but there's usually little reason to do so, except to do certain yield and process quality investigations.

[Edit: Or maybe it's just a matter of adjusting the polarizer of the camera a little bit?]

That's it, I issue a challenge to ATI to provide a high-res transistor layer die shot for me to ogle, or the terrorists have won.

Oh, definitely!
 
Last edited by a moderator:
Check these numbers. AF performance drop is similar to R580's, but MSAA performance drop is much higher. I think most people awaited similar AF drops, but nobody expected that MSAA will hurt performance so much (compared to R580: 2x bus width, 2x MSAA samples/clock, 2x (not sure) better compression..., but worse results )

AF drop is very bad compared to competition 8800GTX in BOTH parts (R580 also suffered texture hardware deficiency).

In 2 out of 3 games the AF drop HD2900 is much more than X1950, in 1 game it is similar. I dont think that bears out your statement. One tie and two severe losses.

The AA results are troubling though, though the sample size is small.

http://www.tweaktown.com/reviews/1115/1/page_1_introduction/index.html

This newer teaktown review in which for some reason they did not turn on AA/AF is interesting and bears out what we are saying, without AA and AF 2900 performs very well, often beating 8800gtx. And at least placing much closer to 8800GTX than 8800GTS.
 
The interview makes R600 all nice and dandy, but real life concrete benchmarks/performance quite don't illustrate it.

You are right, enough of the talk AMD, show the users someting.
I not undesrstand why AMD so optimistic about they design decision, AMD relationship with game dev's are zero (except Valve), and after reading Tim Sweeney interview about Unreal3 engine looks like r600 "walk in the wrong route".
 
Eric

why does the MSAA algortihm cost so much frames? The memory interface is much wider than R580, the compression algoritm are better and the ROPs can do more color/z compares than R580.

The donwsampling in the shader cannot be the only reason for that.

In OGL games we saw some examples where MSAA wasn't so expensive. Is there some real hope to improve the MSAA-hit in general?
 
Eric

why does the MSAA algortihm cost so much frames? The memory interface is much wider than R580, the compression algoritm are better and the ROPs can do more color/z compares than R580.

The donwsampling in the shader cannot be the only reason for that.

In OGL games we saw some examples where MSAA wasn't so expensive. Is there some real hope to improve the MSAA-hit in general?

The actual number of units working on AA wrt to R580 isn't substantially different - thought the new ones are focused on 64b pixels. There are cases where moving the resolve to the shader has hurt performance -- That's typically where the frame rate is very high (i.e. > 100fps). Those are cases we were aware of and only really show up in benchmarks, not so much in real game play. It sort of caps the max fps to 200~300 or so, depending on resolution. Again, not something we felt would seriously impact real game play. It can also be mitigated by selecting the proper AA modes.

As for workloads in various apps, yes, it can be improved in some cases. Without specifics, it's hard to say. Overall, I expect the *scaling* of AA to be similar to slightly worst than R580, for "regular" pixels, but still to be higher than R580 in absolute terms.

But anytime 64b pixels will be invoked (i.e. HDR games), I expect performance of R600 to be substantially higher than R580.
 
The donwsampling in the shader cannot be the only reason for that.
AA resolve shoud be very fast on R600 given its 512 bits bus and its 64 processors.
At the same time I wonder how subsamples are fetched/decompressed and passed to the shaders core, if your AA resolve shader is just using TMUs to fetch those data I can see why it could be held back by the ALUs/TMUs ratio (in AA resolve shader you need a ALUs/TMUs ratio close to 1)
 
Since scaling with AA seems to be worse ATM compared to the R580, is it safe to assume that the raw drivers are the culprit?
 
Status
Not open for further replies.
Back
Top