NV40 Technology explained.
http://www.3dcenter.org/artikel/nv40_pipeline/index2_e.php
(To clarify first, the terminology "pipeline" in this post is not a hardware quad-pipeline.)
I don't understand. I think the number should be eight.
With co-issue, each single shader unit can execute four instrunctions per clock. For example, shader unit 1 should be able to execute the following four instructions in a cycle:
r and b channels could be co-issued. Each mul depends on the previous rsq, but could be dual-issued. That is, shader unit 1 can execute these four instrunctions per clock.
Since there are two shader units, and suppose NV40 could be dual-issued between the two shader units - the destination of shader unit 1 could be the source of shader unit 2 (if not, why this 'complement' design?), there can be another four instrunctions, and the maximum number of instruction per clock should be eight.
Because NV40 has 16 pipes in total, overall chip-perfomance is 16T + 128M at maximum.
What I misssed?
http://www.3dcenter.org/artikel/nv40_pipeline/index2_e.php
(To clarify first, the terminology "pipeline" in this post is not a hardware quad-pipeline.)
Dual- and co-issue combined, an NV40 pipe can execute up to four instructions – while having a single all-purpose arithmetic unit only.
I don't understand. I think the number should be eight.
With co-issue, each single shader unit can execute four instrunctions per clock. For example, shader unit 1 should be able to execute the following four instructions in a cycle:
Code:
rsq r0.r, v0.r
mul r1.r, r0.r, v1.r
rsq r0.b, v1.b
mul r1.b, r0.b, v2.b
Since there are two shader units, and suppose NV40 could be dual-issued between the two shader units - the destination of shader unit 1 could be the source of shader unit 2 (if not, why this 'complement' design?), there can be another four instrunctions, and the maximum number of instruction per clock should be eight.
Because NV40 has 16 pipes in total, overall chip-perfomance is 16T + 128M at maximum.
What I misssed?