Radeon 9700 (96 or 160 instructions per pass?)

The radeon 9700 is supposedly capable of 160 PS instructions per pass, but where does this number come from if does 32 texture ops and 64 pixel calculations in a pass? Doesn't this add up to 96 instructions?

I remember someone saying something about Ati counting vector and scalar ops separately, as opposed to dx9 spec counting them both as 1, due to parallelism. Is Ati counting them separately or are the missing 64 instructions coming from somewhere else?
 
Yes, you are right.

Except 160 - 96 = 64 and not 34 ...

So 96 = 32 + 64 while 160 = 32 + 2*64.
 
Ok, some what OT, but someone can probably answer ;

We now that 9700 can do max 3 intructions / cycle in pixel shader. BUT when we are shading the pixels, can all the 8 pipes execute in paraller ???

That is, can we execute 8 * 3 instructions / cycle ?
 
eSa said:
Ok, some what OT, but someone can probably answer ;

We now that 9700 can do max 3 intructions / cycle in pixel shader. BUT when we are shading the pixels, can all the 8 pipes execute in paraller ???

That is, can we execute 8 * 3 instructions / cycle ?
Of course. What would be the point of having several pipelines if they couldn't operate in parallel?
 
Can anybody confirm how Ati is arriving at a 160 shader per pass instruction count? It makes me wonder whether it is enough for demanding real-time applications, or maybe even the off-line rendering arena.
 
Given the lengths current real-time applications are at the moment then 160 will likely be more than enough for the time being! Even if not, multipassing can be employed for longer instruction lengths and floating point buffers will ensure precision is maintained - this will be useful in off line rendering scenarios as well since performance will not be as much of an issue.
 
DX9 has removed the explicit dual issue of RGB + A and now has vector (RGB[A]) and scalar operations, which cannot be specified as dual issue (i.e. it's two instructions). So, that's 32 texture + 64 vector + 64 scalar = 160 DX9 instructions.

Later
 
I do believe that we claim that we can issue a texture, a scalar and a vector operation all on the same cycle :)

Later
 
Hmm, is the concurrent execution sireric mentioned because the R300's 3 pixel units work in parallel and each one could execute one of those ops?

Also, I'm wondering if the R300 could multithread, like the p10?
 
GetStuff said:
Question: The R300 can do 160 instructions per pass, but how many clock cycles would it cost?

Looks like 64 cycles per pipeline.
Which means it takes 8 cycles to calculate a single pixel on average...

That would peak at 51.6 fps in 1024x768 (no overdraw).
 
Thanks sireric.
I'm interested in pairing rule.
I think NV30 can't handle pairing rule for R300.
HLSL optimization scheme differ among chips.
 
I don't know what future chips will be able to do, but i can say that yes, the R300 can co-issue a 3 component vector, a scalar and a texture instruction. Given that current shaders and most near-future shaders have a 1:1 instruction count between pixel ALU and texture fetches, the R300 co-issue seems like a good compromise -- Being able to do 2 ALUs but nto co-issue a texture instruction would not have been as good.

HLSL optimizations, and even low level shader optimizations are very important for VPUs in the present and even more so in the future. Just like CPU optimizations, the compilers will need to do more and more to get every ounce out of the HW. The low level implementation of HLSL code will most likely be different amoung different chips (even from the same IHV).

Later
 
Back
Top