Why are we discussing the IPC in something that resembles marketingese so much?
Tridam: which parts of your NV40 description are based on testing, and which are based on Kirk's comments? What testing and what comments? What modifiers are supported for the issued units...all the PS 1.4 modifiers? Wouldn't it be important to clarify this information?
The term "instructions" seems like it is being abused to me, especially in a context for comparison of different architectures. Normalization should count as the equivalent of 3 instructions (1 "complex", 2 "simple") for comparison. "Modifiers" should be mentioned as distinct from more general processing ability. Where is the consideration of what can actually be effectively be completed in the same clock cycle...the current discussion seems to assume all ops take the same amount of clock cycles, but is this the case? What impact does a texture op have and for how many clock cycles will it affect IPC? Is it just 1 clock cycle of IPC impact? What about value handling characteristics...as one example: did the NV3x really have a penalty for using constants, and, if so, did the NV40 get rid of it?
Going by the boost in IPC that seems to occur here, the normalization looks to impact execution of approximate equivalence to occupying the "complex" ALU. This seems quite validly stated as getting significant "free" partial precision operations, but it doesn't correspond to the impression of "free" being put forth. Did something Kirk said specify exactly what "free" means for this? Is there some other testing result that gives a different indication than I took away from that link, or another interpretation or a correction of what I came up with in it?
What set of modifiers can the NV40 perform in conjunction with general ops?
ATI has claimed, in my understanding, to be able to do independent ADD and MUL in addition to the full ALUs on the R3xx. Was this discounted to only be described as a modifier? Or is the presumption simply that the discussed "NV40 modifiers" are representing the same functionality?
Has anything changed with regard to this functionality on the R420, or what ATI claims (or is there some error in my understanding of what they claimed)?
...
For my current understanding of the above:
NV40:
Very good at maintaining 1 IPC throughput via "complementary" ALUs to somewhat counter failing to "protect" complex ops IPC throughput from being used for texture ops.
Can boost IPC by complete two part coissue flexibility and a set of modifiers, for each "complementary" ALU.
Can boost IPC, even with modifiers, when a "complex" and "simple" op are paired.
Can boost IPC, in partial precision, by performing the equivalent of 3 ops in one clock cycle (tying up, to my understanding so far, the "complex" ALU).
R420:
Very good at maintaining 1 IPC throughput by having a "complete" ALU and protecting its IPC throughput from being used for texture ops.
Can boost IPC by common case (but not complete) two part coissue flexibility and a set of modifiers (AFAIK, PS 1.4 modifiears).
Can boost IPC, as an alternative to modifiers, for general MUL/ADD operations (subset of "simple ops").
This assumes no changes from what was discussed about the R3xx in this respect...is this true?
...
Even if my understanding is accurate, this isn't the complete picture. Other questions that could be usefully answered include things like: where does SIN/COS fit in now?
I think this thread would be a good place to go into detail about any specific info we have with regards to the above.
Tridam: which parts of your NV40 description are based on testing, and which are based on Kirk's comments? What testing and what comments? What modifiers are supported for the issued units...all the PS 1.4 modifiers? Wouldn't it be important to clarify this information?
The term "instructions" seems like it is being abused to me, especially in a context for comparison of different architectures. Normalization should count as the equivalent of 3 instructions (1 "complex", 2 "simple") for comparison. "Modifiers" should be mentioned as distinct from more general processing ability. Where is the consideration of what can actually be effectively be completed in the same clock cycle...the current discussion seems to assume all ops take the same amount of clock cycles, but is this the case? What impact does a texture op have and for how many clock cycles will it affect IPC? Is it just 1 clock cycle of IPC impact? What about value handling characteristics...as one example: did the NV3x really have a penalty for using constants, and, if so, did the NV40 get rid of it?
Going by the boost in IPC that seems to occur here, the normalization looks to impact execution of approximate equivalence to occupying the "complex" ALU. This seems quite validly stated as getting significant "free" partial precision operations, but it doesn't correspond to the impression of "free" being put forth. Did something Kirk said specify exactly what "free" means for this? Is there some other testing result that gives a different indication than I took away from that link, or another interpretation or a correction of what I came up with in it?
What set of modifiers can the NV40 perform in conjunction with general ops?
ATI has claimed, in my understanding, to be able to do independent ADD and MUL in addition to the full ALUs on the R3xx. Was this discounted to only be described as a modifier? Or is the presumption simply that the discussed "NV40 modifiers" are representing the same functionality?
Has anything changed with regard to this functionality on the R420, or what ATI claims (or is there some error in my understanding of what they claimed)?
...
For my current understanding of the above:
NV40:
Very good at maintaining 1 IPC throughput via "complementary" ALUs to somewhat counter failing to "protect" complex ops IPC throughput from being used for texture ops.
Can boost IPC by complete two part coissue flexibility and a set of modifiers, for each "complementary" ALU.
Can boost IPC, even with modifiers, when a "complex" and "simple" op are paired.
Can boost IPC, in partial precision, by performing the equivalent of 3 ops in one clock cycle (tying up, to my understanding so far, the "complex" ALU).
R420:
Very good at maintaining 1 IPC throughput by having a "complete" ALU and protecting its IPC throughput from being used for texture ops.
Can boost IPC by common case (but not complete) two part coissue flexibility and a set of modifiers (AFAIK, PS 1.4 modifiears).
Can boost IPC, as an alternative to modifiers, for general MUL/ADD operations (subset of "simple ops").
This assumes no changes from what was discussed about the R3xx in this respect...is this true?
...
Even if my understanding is accurate, this isn't the complete picture. Other questions that could be usefully answered include things like: where does SIN/COS fit in now?
I think this thread would be a good place to go into detail about any specific info we have with regards to the above.