Compiler/Instruction Reordering on X1800 series.

ChrisRay

<span style="color: rgb(124, 197, 0)">R.I.P. 1983-
Veteran
I noticed that the X1800XL series doesnt always seem to have optimal ALU performance. ((Look att X1800XL compared to X850XT for instance)) People have suggested that it might be drivers. Is the shader scheduler that much different from the R300 to where this would make a significant difference? So generally how can we code more optimally for the R520 in comparison to R300 for basic shader performance in SM 2.0 apps? I would have thought they'd be similar enough with the R300 baseline that this would be a non issue.

Chris
 
Last edited by a moderator:
One of the reviews (and I don't remember which --maybe Wavey's, but I'm thinking it was Xbit) suggested that the reason they didn't mess with the ALUs this time was for the specific purpose of not messing up their compiler opts.

Which, if true, would tend to lean in the other direction from what you are suggesting.
 
From the third page of B3D's review:
Although the colours have changed in the Pixel Shader core's diagram since the past few architecture releases, the organisation an arrangement haven't - this is, in fact, because the actual ALU organisation remains unchanged. Although everything in the pipeline has been re-engineered to hit new target clocks that the 90nm process can enable and the capabilities have been extended for Pixel Shader 3.0 operation, the same ALU structure has been kept partially because ATI already have a highly optimised shader instruction compiler, which would need to be re-written for any different ALU organisation.

But I, too, think (hope?) ATI has some room for driver improvement. It's hard to believe the X1600XT with its 12 shader units and 590MHz core clock doesn't run roughshod over a 12 pipe, 325MHz 6800 in any pixel shader benchmark, and yet here it is merely tying in 3DM05's PS test. It's underwhelming in ShaderMark, too.
 
Last edited by a moderator:
Pete said:
But I, too, think (hope?) ATI has some room for driver improvement. It's hard to believe the X1600XT with its 12 shader units and 590MHz core clock doesn't run roughshod over a 12 pipe, 325MHz 6800 in any pixel shader benchmark, and yet here it is merely tying in 3DM05's PS test. It's underwhelming in ShaderMark, too.
Ah, but it's also only got four texture units. So maybe some of the shaders in those tests require more texture operations than the X1600XT can comfortably make use of. The X1600XT also only has a 128-bit memory bus, so maybe that's a limitation in some of the benchmarks.
 
Chalnoth said:
Ah, but it's also only got four texture units. So maybe some of the shaders in those tests require more texture operations than the X1600XT can comfortably make use of. The X1600XT also only has a 128-bit memory bus, so maybe that's a limitation in some of the benchmarks.

The four texture units will be more a problem in real games because shaderbenchmarks normaly don't use expensive filter. Anyway the NV4X ALUs can do more work per clock than the R(V)5XX ALUs. This can help to compansate the clock difference.

The load balencing between ALUs and TMUs is a nice thing but if all thread wait for texture samples the dispatcher can not use the ALUs even if they are free.
 
Pete said:
From the third page of B3D's review:

Oh, okay, so it was Wavey. :p

Certainly the extra time with the driver team had to help so far, but, yeah, I'm still expecting this gen's performance to get better as they have more time to tweak the scheduler and memory bus and get feedback from a much wider audience. What's reasonable? I dunno. There is, I think, a significant chance of a signficant upside (30%+), but I wouldn't call it a probability. I'd describe it as "I'm not counting on it, but it wouldn't surprise me."

I seem to recall that the last time NV did a major memory controller upgrade that they had a legit mid-life boost that was fairly eye-popping. An ATI boost in that area might have less to do with the "ring" nature of the bus than opt'ing for the greater number of controllers and better granularity.
 
Last edited by a moderator:
Back
Top