Sir Eric Demers on AMD R600

3dilettante · Jun 15, 2007

When we selected 80HS, we selected it based on faster transistors, better density and also the fact that 65nm would not be available for production for R600. 65nm did get pulled in so the schedule aspect was probably a little off. However, the transistors are faster and leakier, and we were aware of this. We ended up setting up the final clock based on TDP for the worst case scenarios. That usually means that thereâ€™s a lot of headroom left for overclocking. Most boards seem to get engines over 800 MHz without any problems. With a good bottle of liquid nitrogen, you can get over 1 GHZ ;-)

Was the final clock based on worst-case TDP in part due to an unexpected amount of variation in leakage between parts?

For CPUs in all but the highest of high end segments, power draw has been characterized as a zero-sum game.

I think it was someone at ATI (Dave Orton?) who felt that the same rule did not apply to GPUs: that there wasn't such a ceiling.

Has this been reevaluated, since it seems clocks are more thermally-limited than timing-limited for R600?

nAo · Jun 15, 2007

Tim Murray said:
Was anyone expecting G80 to be a unified architecture, much less a scalar one?

I was not expecting it to be a UA but the scalar approach is something that I was somewhat expecting sooner than later (I think you can find some VERY old post of mine advocating it cause it makes so much sense..).
I'm also sure that sireric and his team have a better perspective about their competition then the average (or not so average) B3Der..

Tim Murray · Jun 15, 2007

Here's one I forgot to ask because I am dumb.

Crossfire certainly seems to be scaling much better with R600 than with R5x0. What's the reason for that?

Aerows · Jun 15, 2007

sireric said:
Couple of comments:

1) I did let the weasels review the docs; but I also let various other engineering teams review too.

Does that mean "no", we can't get naked money shots?

Sound_Card · Jun 15, 2007

trinibwoy said:
Yeah but have we actually seen R580 go head to head against R520 with everything constant but the shader count? R580 has become synonymous with the X1950XTX and it has a significant bandwidth advantage over R520 based parts. The only place I could find a direct comparison was on Tom's VGA Charts and R580's 3x shader power doesn't seem to be doing much at all when comparing the X1900XTX to the X1800XT. I would really like to see a head-to-head of these two parts in today's games but as expected they aren't included in benchmark suites any more.

http://www.anandtech.com/printarticle.aspx?i=2679

I have another review saved somewhere... I will have to take a dig.

But this one shows the advatages of the shaders of R580 over R520.

Razor1 · Jun 15, 2007

trinibwoy said:
Not bad, not bad :smile: What does shadowing stress the most anyway?

Fears shadows were volumetric so pixel fillrates would probably be most effected

but concerning the r580 vs the r520, we still get alot less then the theoretical increase of 300% on the shader ALU's, yes ALU usage will become higher as new games come out, but even with FEAR, Oblivion, and a few other new games coming out, they still need a decent amount of texture filtering power, and pixel fillrate when compared to older games, its not like keeping the same or similiar amount or through put of ROP's and TMU's was a "smart" choice for future games, I think it was underestatmated how fillrates and texture through put or overestimated with ALU usage with the r600.

AlexV · Jun 15, 2007

A quick(and primitive explanation) would be that now they`ve implemented that less aggressive AFR as the primary path for Crossfire instead of SuperTiling as it was before, with more aggressive AFR involving game profiling by ATi, so instead of SuperTiling's about zilch scaling, even in cases where there are no game profiles you get something that tends to scale ok. But there`s probably more to it than that.

Geo · Jun 15, 2007

Morgoth the Dark Enemy said:
A quick(and primitive explanation) would be that now they`ve implemented that less aggressive AFR as the primary path for Crossfire instead of SuperTiling as it was before, with more aggressive AFR involving game profiling by ATi, so instead of SuperTiling's about zilch scaling, even in cases where there are no game profiles you get something that tends to scale ok. But there`s probably more to it than that.

Well, I was just looking at some recent benchies elsewhere and noticing R600 CrossFire goes toe-to-toe with GTX SLI for the most part, so I'd think it's more than that.

Sound_Card · Jun 15, 2007

Razor1 said:
Fears shadows were volumetric so pixel fillrates would probably be most effected

hmm... I thought it was volumetric lighting. Is their a such thing as volumetric shadows?

Razor1 · Jun 15, 2007

Sound_Card said:
hmm... I thought it was volumetric lighting. Is their a such thing as volumetric shadows?

Shadow volumes (polygonal shadows) are volumetric shadows

, same stuff as Doom 3, well should say similiar, don't know if FEAR caps their shadows or extrudes them to a point.

sireric · Jun 15, 2007

Rys said:
There's so much extra we could have asked and question avenues we could have gone down, but hopefully we got some goodies in there to make it worth a read. Thanks to Eric for answering over 30 Qs :smile:

I'd love to know so much more based on what was said, here's a sample off the top of my head:

~750MHz being partly a yield choice, as well as a performance one, but why exactly when it seems 800MHz would have been just as comfortable from a heat and power perspective, and you seem to be yielding excellently by virtue of only having one SKU?

I did not pick 750 -- It was an iteration process, based on the "corner" process lots to get the best yield. The difference between 750 and 800 is not enough to really make a difference. 800 was probably just a little too much for some corners. Hitting a good price was very important.

Was there ever a set of simulations run comparing R600 + R580 RBEs vs the final design, and if so what were the interesting datapoints that came out?

Yes, there were hundreds. Yes, a few things came out in the final design, but, in general, it was as expected. -- Resolve was moved into the shader, so that can't really get compared.

DX10 is mentioned all over the place, but is there any consideration for OpenGL futures when designing a base architecture these days?

There are some OGL considerations -- Certainly some of the workstation requirements are taken into considerations, though those don't tend to push the technical enveloppe. We also make sure that current generation apps are working well on new architectures. But when it comes to defining next gen features, it's been coming generally from the DX or even GPGPU side. There's more today from OpenGL-ES than OpenGL traditional. Having said that, there's certainly energy afoot to make all new features (and more) available to OGL asap.

How exactly does the fast path work for passing back MSAA sample data to the shader core?

Secret.

Have you had time to poke the windowed sinc yet?!

Not my team directly, so I'd have to check. We know of your request mister!

If the tesselator is so cheap in terms of area, why not build it into a GPU before Xenos, or was there a coming together of DirectX future discussions that made it inapplicable to R5xx because the tesselator's future in DirectX hadn't been decided yet?

Dunno, really. I was't involved in the Xenos decisions, so I don't know/remember exactly. There certainly was't a great pull for it during the R5xx timeframe -- DX9 or DX10 weren't planning on supporting it.

What proportion of R600 is memory cells versus programmable logic vs glue logic vs IO vs clocking vs power? Or easier to answer, which one dominates, and which dominated R580?

Not exactly sure what all your terms mean -- Standard logic I believe dominates the chip as a whole, but it's followed closely by memory; the two not being that far apart. Not sure on the other ones, but I would assume pretty low.

And as far as RV6xx goes, folks, we held off on too many derivative questions until we've had a closer looks at the first boards. If Eric's keen, we'll poke him again in July, with less Furby!

Sure. I'll be off for part of July (Europe mostly), but I'll be around. I'm not taking as much time in the above answers. because I'm doing it fast, and it's friday afternoon...

Sound_Card · Jun 15, 2007

Razor1 said:
Shadow volumes (polygonal shadows) are volumetric shadows , same stuff as Doom 3, well should say similiar, don't know if FEAR caps their shadows or extrudes them to a point.

cool.

First time I hear about it. Learn something everyday.

What other types of shadows are there?

Sound_Card · Jun 15, 2007

@Sireric,

I know you can't comment on future products, but I thought it would not hurt to ask.

Do you have a refresh in the works?

Another qustion...

is their anything in R600 that you wish you could improve on?

Razor1 · Jun 15, 2007

Sound_Card said:
cool.

First time I hear about it. Learn something everyday.

What other types of shadows are there?

there really only is 2 types of shadows for now volumetric and shadow maps. Of course many games use a combination of both depending on the lighting, and objects.

Mintmaster · Jun 16, 2007

sireric said:
All I meant is that I don't understand completely the decisions done by the competition. Nor do I really care quite that much, though it's intriguing. It's also clear that our ratio is more inline with future applications than past ones. A lower ratio does work well with a lot of older apps and even quite a few current apps. But what we've been seeing is that applications are moving more and more towards larger ratios. In that sense we are more forward looking. But it's a tough edge; it's costly to be too forward looking.

I personally think that was the biggest mistake of R5xx. The ultra-threaded nature required so much more die-space compared to NV4x/G7x that it couldn't really compete in the other price categories. It's great for a developer that wants to mess with dynamic branching, but it went 95% wasted for the consumer because few if any games took advantage of it.

As for the texturing, people have different visions on where the future is headed. I personally think you need to go data based for realistic graphics. High resolution textures (which can drastically increase AF sample requirements) and PRT are, IMO, the two most important things needed for realistic graphics. Math can only take you so far.

My three favourite 3D graphics techniques are variance shadow mapping, SH lighting (esp. with neighbourhood transfer -- volume textures!), and HDR. None of them need high ALU:TEX ratios. That's why I had to go with NVidia for the first time.

Farhan · Jun 16, 2007

3dilettante said:
I found the commentary on the virtualization of chip resources interesting.
The chip does do some kind of renaming/mapping of registers in order for it to execute different threads.

I'm sure a big area of future "optimizations" is just finding ways to make sure different contexts being mapped to the hardware don't bump heads on shared resources and are overlapped to handle variable latencies better.

As for the die shot:
AARRGGHH!!
They had a picture of the metal layer?
That's like putting a burka on a hot chick.
Is it wrong that I want to see the transistor layer naked?

That's it, I issue a challenge to ATI to provide a high-res transistor layer die shot for me to ogle, or the terrorists have won.

Yeah, i want to see some nekkid die shots too. Top level clock trees aren't too interesting

@Sireric
Are any parts of current GPUs fully custom designed (as in, all the way down to hand layouts)? If not, what is the lowest level where custom design work done, and how much of it is done? Or is it all just behavioral hdl code run through a compiler/synthesizer, etc? Do you see future GPUs using more custom design work for important parts such as ALUs?

Jawed · Jun 16, 2007

Some observations:

The R600 has a scalar design structure, which is optimized for what we see in current shaders and what we see in upcoming shaders. More and more work gets done in scalar paths and focusing on this gives you the best flexibility to address that. Basically, you can do vector perfectly and deal with any scalar paths too. As well, from a physical standpoint, focusing on a scalar design is significantly easier than having to deal with vector datapaths.

This implies, to me, that building a scalar register file was the key change. It presumably allows for more finely-grained allocation of registers. I presume the approach of R3xx...R5xx is somewhat wasteful in this regard, where a vec3 register, e.g. r0.rgb consumes 4x fp32 slots in the register file. In R600 this register consumes 3x fp32 slots.

The second effect then comes from all the VLIW/superscalar stuff that we've done to death...

The samplers were designed to be 64b samplers from nearly the beginning, and matching that to BW and keeping the 4:1 ratio on ALU:Tex was the design choice made. In the latest games, where ALU:TEX ratios hit 15~20, this really shines.

Presumably these are games that are not released...

Inherent in every one of our architectures is the question of scalability. Itâ€™s a worthless architecture if it cannot scale from the integrated space all the way to the high end multi-GPU systems. Having said that, DX10 certainly is a change from DX9. Obvious things such as vertex / pixel shader scalability are now gone. We only have one shader core now. However, weâ€™ve made it scalable in 2 dimensions now, both in terms of pixels processed at the same time within a SIMD array, as well as number of parallel SIMDs we can put into the array. Thatâ€™s just an example. We are also scalable on ROPs, textures and various other internal ways. Also, thereâ€™s functionality that is modular, such as HiZ and UVD, which can be present or not.

I think this is Geo's much-opined "crossbar" twixt shader pipes and RBEs coming true...

Jawed

Subtlesnake · Jun 16, 2007

Sound_Card said:
http://www.anandtech.com/printarticle.aspx?i=2679

I have another review saved somewhere... I will have to take a dig.

But this one shows the advatages of the shaders of R580 over R520.

Yeah, and the extra bandwidth on the X1950 XTX made very little difference at all.

http://www.firingsquad.com/hardware/ati_radeon_x1950_xtx_performance/page5.asp

Rangers · Jun 16, 2007

nicolasb said:
Things may have changed in the past couple of weeks, but the last benchmarks I saw suggested that R600 performance fell off a cliff as soon as you enabled any significant amount of anisotropic filtering. (By contrast 8800GTX performance drops off far less as you step up the AF level).

Yes.

It only has 16 texture filtering units.

I'd think turning on AF will damage the architecture far more than turning on AA. A lot of people seem to think there's something wrong with the AA of R600, when actually it's just that review sites turn on AF and AA together, and the performance hit comes from the AF.

sireric · Jun 16, 2007

Sound_Card said:
@Sireric,

I know you can't comment on future products, but I thought it would not hurt to ask.

Do you have a refresh in the works?

Always are new products in the works. That's how this marketplace works. And with design cycles as long as ours, in fact next, next-next and next-next-next generations are all being worked on. And stuff beyond that is being thought about.

Another qustion...

is their anything in R600 that you wish you could improve on?

Lots!! But that's always been true of every project...

Sir Eric Demers on AMD R600

3dilettante

nAo

Nutella Nutellae

Tim Murray

the Windom Earle of mobile SOCs

Aerows

Sound_Card

Razor1

AlexV

Heteroscedasticitate

Geo

Mostly Harmless

Sound_Card

Razor1

sireric

Sound_Card

Sound_Card

Razor1

Mintmaster

Farhan

Jawed

Subtlesnake

Rangers

sireric

Similar threads