Radeon 9700 (96 or 160 instructions per pass?)

Luminescent · Sep 28, 2002

The radeon 9700 is supposedly capable of 160 PS instructions per pass, but where does this number come from if does 32 texture ops and 64 pixel calculations in a pass? Doesn't this add up to 96 instructions?

I remember someone saying something about Ati counting vector and scalar ops separately, as opposed to dx9 spec counting them both as 1, due to parallelism. Is Ati counting them separately or are the missing 64 instructions coming from somewhere else?

Hyp-X · Sep 28, 2002

Yes, you are right.

Except 160 - 96 = 64 and not 34 ...

So 96 = 32 + 64 while 160 = 32 + 2*64.

eSa · Sep 28, 2002

Ok, some what OT, but someone can probably answer ;

We now that 9700 can do max 3 intructions / cycle in pixel shader. BUT when we are shading the pixels, can all the 8 pipes execute in paraller ???

That is, can we execute 8 * 3 instructions / cycle ?

Luminescent · Sep 28, 2002

Sorry, I subtracted from 130. It is 160 - 96 which leaves 64, but from where?

P.S. I edited my previous post.

Xmas · Sep 28, 2002

eSa said:
Ok, some what OT, but someone can probably answer ;

We now that 9700 can do max 3 intructions / cycle in pixel shader. BUT when we are shading the pixels, can all the 8 pipes execute in paraller ???

That is, can we execute 8 * 3 instructions / cycle ?

Of course. What would be the point of having several pipelines if they couldn't operate in parallel?

Luminescent · Sep 29, 2002

Can anybody confirm how Ati is arriving at a 160 shader per pass instruction count? It makes me wonder whether it is enough for demanding real-time applications, or maybe even the off-line rendering arena.

Dave Baumann · Sep 29, 2002

Given the lengths current real-time applications are at the moment then 160 will likely be more than enough for the time being! Even if not, multipassing can be employed for longer instruction lengths and floating point buffers will ensure precision is maintained - this will be useful in off line rendering scenarios as well since performance will not be as much of an issue.

sireric · Sep 29, 2002

DX9 has removed the explicit dual issue of RGB + A and now has vector (RGB[A]) and scalar operations, which cannot be specified as dual issue (i.e. it's two instructions). So, that's 32 texture + 64 vector + 64 scalar = 160 DX9 instructions.

Later

moichi · Sep 30, 2002

Can R300 automatically co-issue 3-vector operation and following scalar operation?

sireric · Sep 30, 2002

I do believe that we claim that we can issue a texture, a scalar and a vector operation all on the same cycle

Later

Luminescent · Sep 30, 2002

Hmm, is the concurrent execution sireric mentioned because the R300's 3 pixel units work in parallel and each one could execute one of those ops?

Also, I'm wondering if the R300 could multithread, like the p10?

GetStuff · Sep 30, 2002

Question: The R300 can do 160 instructions per pass, but how many clock cycles would it cost?

Hyp-X · Sep 30, 2002

GetStuff said:
Question: The R300 can do 160 instructions per pass, but how many clock cycles would it cost?

Looks like 64 cycles per pipeline.
Which means it takes 8 cycles to calculate a single pixel on average...

That would peak at 51.6 fps in 1024x768 (no overdraw).

moichi · Oct 1, 2002

Thanks sireric.
I'm interested in pairing rule.
I think NV30 can't handle pairing rule for R300.
HLSL optimization scheme differ among chips.

sireric · Oct 1, 2002

I don't know what future chips will be able to do, but i can say that yes, the R300 can co-issue a 3 component vector, a scalar and a texture instruction. Given that current shaders and most near-future shaders have a 1:1 instruction count between pixel ALU and texture fetches, the R300 co-issue seems like a good compromise -- Being able to do 2 ALUs but nto co-issue a texture instruction would not have been as good.

HLSL optimizations, and even low level shader optimizations are very important for VPUs in the present and even more so in the future. Just like CPU optimizations, the compilers will need to do more and more to get every ounce out of the HW. The low level implementation of HLSL code will most likely be different amoung different chips (even from the same IHV).

Later

Radeon 9700 (96 or 160 instructions per pass?)

Luminescent

Hyp-X

Irregular

eSa

Luminescent

Xmas

Porous

Luminescent

Dave Baumann

Gamerscore Wh...

sireric

moichi

sireric

Luminescent

GetStuff

Hyp-X

Irregular

moichi

sireric

Similar threads