Since some detail info about NV40's shader unit is already revealed,I think it's time for us to have some discussion.
What I know about NV40:
SU1(shader unit1) can fetch 1 tex@full speed or do one 4 component alu ops per clock and it has free fp16 normalization.
SU2 can not fetch tex, but can do one 4 component alu ops per clock. No free fp16 normalization.
Both SU can do co-issue in 3/1 or 2/2 manner and the operation between 2 SU seems to be indepedent.
And maybe there's mini ALU in each SU, I don't know what they're for. Maybe for register/instruction modifier just like what we saw in R3XX.
What I know about NV40:
SU1(shader unit1) can fetch 1 tex@full speed or do one 4 component alu ops per clock and it has free fp16 normalization.
SU2 can not fetch tex, but can do one 4 component alu ops per clock. No free fp16 normalization.
Both SU can do co-issue in 3/1 or 2/2 manner and the operation between 2 SU seems to be indepedent.
And maybe there's mini ALU in each SU, I don't know what they're for. Maybe for register/instruction modifier just like what we saw in R3XX.