NV30,35 & R300/R350 Pixel Shader Pipes Compared (New inf

sireric said:
Actually, because of constant propagation optimization, it should execute in 1 cycle on an R3x0 (eventually). Something like:
add_sat oC0, (c0+c1-c2), v1

We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well. However, it will get better. That's also why one should be careful when trying to determine our internal architecture based on shader code.
I know that my shader is not exactly a good practice to write a shader. I (we? ;)) am just curious if that additional ALU unit can be used in general ps 2.0 shaders, or is it used just for _x* and _d* modifiers in ps 1.x. My guess is that it's currently not active in ps 2.0 environment, but might got used once ps compiler gets smart enough, right?
 
The second ALUs will certainly get used. Not sure how much with the current compilers, but certainly for PS 1.x stuff. But not all functions get to use them -- add_sat would fall in that.
 
Assuming the compiler could cope how much of a speed benefit would be possible if the mini ALUs were swopped out for fully fledged ALUs (and what sort of gain in transistor count would this involve)???
 
Full fledge ALUs have more features/functions, consequently upgrading the mini to full status will probably end up doubling up performance of ALU bound items for all cases, not just the current set.

Honestly, I don't remember the transistor count (and I won't go look it up anyway). If it had been trivial, I would of doubled (or more) things :)
 
I clamping add to [0,1] output. It's generally not used very often, except, perhaps, as a final operation before sending out data.
 
Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Ostsol said:
digitalwanderer said:
What does it mean, what benefits/limitations/new things does it imply?
It means developers have more to pay attention to when optimizing their shaders and therefore can potentially squeeze out a few more fps than they previously would have been able to.
That's not quite right.

What it really means is that the developers write the code they want to, and the driver ensures it runs as fast as possible. All it wants is to be given an efficient algorithm. It'll give you the result you coded, running as fast as the hardware can do it.

So optimise the algorithm - leave the rest to the driver!
 
sireric said:
That's also why one should be careful when trying to determine our internal architecture based on shader code.

Well if you just tell us you can save us all a lot of time....but noooooo you wont :) :) :)

Thanks for the info!!!
 
sireric said:
We are working hard on improving our current PS compiler, so that it can map PS ops to our HW in an optimal way. The current stuff is pretty simple. The HW is naturally very fast and executes well.

So in plain english, the current performance we're seeing in PS2.0 code (HL2 for example) from the R3x00 series is only the tip of a very big iceberg and it's only going to get faster once the ati driver guys *cough* opengl guy, catalyst maker, et al *cough* :p figure out how to use the hardware that was given to them.

Legitimate performance improvements from drivers? Who'd have thunkit in this day and age. :)

Holy hell... That means that HL2 PS2.0 speed is basically brute force?? :oops:
 
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.
 
sireric said:
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Any ballpark estimate regarding performance improvements that you perchance might be able to provide? :)
 
Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Dio said:
Ostsol said:
digitalwanderer said:
What does it mean, what benefits/limitations/new things does it imply?
It means developers have more to pay attention to when optimizing their shaders and therefore can potentially squeeze out a few more fps than they previously would have been able to.
That's not quite right.

What it really means is that the developers write the code they want to, and the driver ensures it runs as fast as possible. All it wants is to be given an efficient algorithm. It'll give you the result you coded, running as fast as the hardware can do it.

So optimise the algorithm - leave the rest to the driver!
That's true, but whatever the developer can do to maximize the potential optimizations that the compiler can produce is always good. It's certainly not so severe as programming for SSE, of course. :)
 
Natoma said:
sireric said:
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Any ballpark estimate regarding performance improvements that you perchance might be able to provide? :)


hummmmm.... nope.
 
sireric said:
Natoma said:
sireric said:
Well, I would not say "tip of the iceberg", and for short shaders it's very close to optimal, but you should expect some more performance, when we release a more advance compiler. Soon.

Any ballpark estimate regarding performance improvements that you perchance might be able to provide? :)


hummmmm.... nope.

Oh wells. Had to give it a shot. :D
 
Re: NV30,35 & R300/R350 Pixel Shader Pipes Compared (New

Ostsol said:
Dio said:
So optimise the algorithm - leave the rest to the driver!
That's true, but whatever the developer can do to maximize the potential optimizations that the compiler can produce is always good.
All that matters is 'do what's necessary, and don't do what isn't'. I really can't think what I could tell people about R3xx pixel shaders that would help them produce code that could be compiled better, except what's already been said in many a (public) ATI developer document.

It's certainly not so severe as programming for SSE, of course.
Ah, the stories I wish I could tell you....
 
MDolenc said:
pcchen: I guess that in your case driver is just being smart and rearranges your shader into:
Code:
mov r1, c1 
mad r0, v0, c0, r1 
texld r1, t0, s0
mad r0, r0, c1, r0 
mad r0, r0, v1, r1
mov oC0, r0

It's possible but the second test can't be explained by this rearrangement (there's an additional abs between mul and add).
 
Back
Top