NV30,35 & R300/R350 Pixel Shader Pipes Compared (New inf

MDolenc · Sep 19, 2003

sireric said:
Actually, because of constant propagation optimization, it should execute in 1 cycle on an R3x0 (eventually). Something like:
add_sat oC0, (c0+c1-c2), v1

We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well. However, it will get better. That's also why one should be careful when trying to determine our internal architecture based on shader code.

I know that my shader is not exactly a good practice to write a shader. I (we?

) am just curious if that additional ALU unit can be used in general ps 2.0 shaders, or is it used just for _x* and _d* modifiers in ps 1.x. My guess is that it's currently not active in ps 2.0 environment, but might got used once ps compiler gets smart enough, right?

sireric · Sep 19, 2003

The second ALUs will certainly get used. Not sure how much with the current compilers, but certainly for PS 1.x stuff. But not all functions get to use them -- add_sat would fall in that.

Luminescent · Sep 19, 2003

The small alu's have the hardware to execute any general mul/add/sub?

Heathen · Sep 19, 2003

Assuming the compiler could cope how much of a speed benefit would be possible if the mini ALUs were swopped out for fully fledged ALUs (and what sort of gain in transistor count would this involve)???

sireric · Sep 19, 2003

Full fledge ALUs have more features/functions, consequently upgrading the mini to full status will probably end up doubling up performance of ALU bound items for all cases, not just the current set.

Honestly, I don't remember the transistor count (and I won't go look it up anyway). If it had been trivial, I would of doubled (or more) things

Luminescent · Sep 19, 2003

In reference to the small alu's:

sireric said:
not all functions get to use them -- add_sat would fall in that

What separates the add_sat from the other types of add?

sireric · Sep 19, 2003

I clamping add to [0,1] output. It's generally not used very often, except, perhaps, as a final operation before sending out data.

Dio · Sep 19, 2003

Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Ostsol said:
digitalwanderer said:

What does it mean, what benefits/limitations/new things does it imply?

Click to expand...

It means developers have more to pay attention to when optimizing their shaders and therefore can potentially squeeze out a few more fps than they previously would have been able to.

That's not quite right.

What it really means is that the developers write the code they want to, and the driver ensures it runs as fast as possible. All it wants is to be given an efficient algorithm. It'll give you the result you coded, running as fast as the hardware can do it.

So optimise the algorithm - leave the rest to the driver!

Heathen · Sep 19, 2003

Honestly, I don't remember the transistor count (and I won't go look it up anyway). If it had been trivial, I would of doubled (or more) things

Maybe an idea for a future chip?

jb · Sep 19, 2003

sireric said:
That's also why one should be careful when trying to determine our internal architecture based on shader code.

Well if you just tell us you can save us all a lot of time....but noooooo you wont

Thanks for the info!!!

Natoma · Sep 19, 2003

sireric said:
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well.

So in plain english, the current performance we're seeing in PS2.0 code (HL2 for example) from the R3x00 series is only the tip of a very big iceberg and it's only going to get faster once the ati driver guys *cough* opengl guy, catalyst maker, et al *cough*

figure out how to use the hardware that was given to them.

Legitimate performance improvements from drivers? Who'd have thunkit in this day and age.

Holy hell... That means that HL2 PS2.0 speed is basically brute force??

Dave Baumann · Sep 19, 2003

I think you might want to look at Dio for further shader compiler optimisations....

sireric · Sep 19, 2003

Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Dio · Sep 19, 2003

There's no such thing as the perfectly optimal compiler!

Natoma · Sep 19, 2003

sireric said:
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Any ballpark estimate regarding performance improvements that you perchance might be able to provide?

Ostsol · Sep 20, 2003

Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Dio said:
Ostsol said:

digitalwanderer said:

What does it mean, what benefits/limitations/new things does it imply?

Click to expand...

It means developers have more to pay attention to when optimizing their shaders and therefore can potentially squeeze out a few more fps than they previously would have been able to.

Click to expand...

That's not quite right.

What it really means is that the developers write the code they want to, and the driver ensures it runs as fast as possible. All it wants is to be given an efficient algorithm. It'll give you the result you coded, running as fast as the hardware can do it.

So optimise the algorithm - leave the rest to the driver!

That's true, but whatever the developer can do to maximize the potential optimizations that the compiler can produce is always good. It's certainly not so severe as programming for SSE, of course.

sireric · Sep 20, 2003

Natoma said:
sireric said:

Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Click to expand...

Any ballpark estimate regarding performance improvements that you perchance might be able to provide?

hummmmm.... nope.

Natoma · Sep 20, 2003

sireric said:
Natoma said:

sireric said:

Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Click to expand...

Any ballpark estimate regarding performance improvements that you perchance might be able to provide?

Click to expand...

hummmmm.... nope.

Oh wells. Had to give it a shot.

Dio · Sep 20, 2003

Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Ostsol said:
Dio said:

So optimise the algorithm - leave the rest to the driver!

Click to expand...

That's true, but whatever the developer can do to maximize the potential optimizations that the compiler can produce is always good.

All that matters is 'do what's necessary, and don't do what isn't'. I really can't think what I could tell people about R3xx pixel shaders that would help them produce code that could be compiled better, except what's already been said in many a (public) ATI developer document.

It's certainly not so severe as programming for SSE, of course.

Ah, the stories I wish I could tell you....

pcchen · Sep 20, 2003

MDolenc said:
pcchen: I guess that in your case driver is just being smart and rearranges your shader into:

Code:

mov r1, c1 mad r0, v0, c0, r1 texld r1, t0, s0 mad r0, r0, c1, r0 mad r0, r0, v1, r1 mov oC0, r0

It's possible but the second test can't be explained by this rearrangement (there's an additional abs between mul and add).

NV30,35 & R300/R350 Pixel Shader Pipes Compared (New inf

MDolenc

sireric

Luminescent

Heathen

sireric

Luminescent

sireric

Dio

Heathen

jb

Natoma

Dave Baumann

Gamerscore Wh...

sireric

Dio

Natoma

Ostsol

sireric

Natoma

Dio

pcchen

Moderator

Similar threads