Sir Eric Demers on AMD R600

fellix · Jul 19, 2007

Just because it looks "natural" (or simplified) for us, as humans, it doesn't mean the machine on the other side works the same way. A bright enough developer should be keen to employ every trick out there, to take the maximum of a given platform.
Well, not that this happens every time, unfortunately.

Geo · Jul 19, 2007

cadaveca said:
lol Geo..too much info in there to just start pruning willy-nilly, eh?

Well, I could just lock the thread and force you guys to go start a new one to talk about this other stuff. It's not that it isn't interesting. Just unfair to Eric if he's still dropping in and having to read thru to see if there is something legitimately related to his interview.

AlexV · Jul 19, 2007

cadaveca said:
lol Geo..too much info in there to just start pruning willy-nilly, eh?

I tell ya what tho...the conversation, albeit a bit technical, deals primarily with a complete balance of both G80 and R600. OBth Jawed and Mintmaster have picked out alot of behaviors on each gpu that can affect(quite largely, i might add) how a programmer may end up writing his code.

Differnt types of rendering have different needs, resource wise. IF we take the "major" graphics engines in use right now(Source, IDtech, Unreal3.0, Chrome, etc), you find each has different needs from a gpu.

But the one thing I have not seen Jawed or Mintmaster mention is that R600 is basically 64 pipes, each pipes with 4 main ALU's, and an additional 5th ALU with added functionality(integer multiplication and division, bit shifts, reciprocal, division, sqrt, rsqrt, log, exp, pow, sin, cos and type conversion. Source HD2000 programming guide).

As the programming guide states,
.

OOOps. You see that there math up there? It gets used ALOT.

To tkae again from the programming guide, on R600,

Will run much slower than

Now, it's natural for a programmer to choose first option, as it means less typing.

But the question that comes to my mind is,

how does G80 deal with the same code?

With the exclusion of capbits in DX10, they can no longer substitute thier own code by driver when code given by api is not ideal. I think we now call this an "optimization"...

Anyway, given this, seemingly R600 likes simplified math. Can someone tell me about G80?

And this is where compiler magic should come in.

Mintmaster · Jul 19, 2007

cadaveca said:
To tkae again from the programming guide, on R600,

Code:

float x = a + b + c;

Will run much slower than

Code:

float t = a + b; float x = t + c

Now, it's natural for a programmer to choose first option, as it means less typing.

You misread the guide. It's simply explaining why the first code fragment needs two instruction slots by illustrating its equivalence to the second. R600 will execute both at the same speed, as it needs to wait for one addition before proceeding to the other.

Geo said:
Well, I could just lock the thread and force you guys to go start a new one to talk about this other stuff. It's not that it isn't interesting. Just unfair to Eric if he's still dropping in and having to read thru to see if there is something legitimately related to his interview.

Yeah, that's why I suggested this a long time ago, but it may be too late now to get sireric back here.

If you were to prune, I'd start with this post from Razor1 since that spurred Jawed to make the assertion that I'm debating. A good title would be "G80/R600 architectural decisions", with maybe a post at the top to summarize what the thread is about and where it came from. Alternatively, you can tack it onto this thread, but I'm not a big fan of resurrecting old threads.

cadaveca · Jul 19, 2007

Mintmaster said:
You misread the guide. It's simply explaining why the first code fragment needs two instruction slots by illustrating its equivalence to the second. R600 will execute both at the same speed, as it needs to wait for one addition before proceeding to the other.

I read that as saying that, irregardless, the instruction needs two slots, so it's more pragmatic to just use two seperate instructions as you have 5 slots to fill anyway. It then goes on to say that the compiler reads left to right in traditional "BEDMAS" format, so because you ideally must fill 5 slots(for each alu within the pipe) prioritization is important.

That example is just an example...explaining:

Itâ€™s important to not assume that because there are 5 independent scalar units you will always be able to
crunch through the math at 5 scalar operations at a time. Depending on what the shader does you may at
worst not be able to execute more than one scalar in parallel.

.

Geo · Jul 19, 2007

Good grief that's 8 pages ago!

I hereby declare this excellent thread to be closed, and invite the hardcore to start their own after-hours party in a different thread.

Sir Eric Demers on AMD R600

fellix

Geo

Mostly Harmless

AlexV

Heteroscedasticitate

Mintmaster

cadaveca

Geo

Mostly Harmless

Similar threads