To take this further consider dependant reads, you have to potentually absorb the latency through the texture cache, out to memory (inc arnitration time), back through texture cache and through the filtering HW. This adds up to 100's of cycles of latency which if not absorbed will cause a stall. So how big are you temp register files ? Very big.
This makes it reasonable to trade off number of temps vs latency buffering, of course why NV's chips suffer so badly when using such a small number of regs seems a bit strange as thsi would imply that they barley have enough buffering to hide their ALU latency, let alone latency of dependent reads (although there are other scheme that can be used to help with them).
On number of ports, If you're supporting par vec and scalor ops then per clock you need up to 6 read and 2 write ports, 8 ports is still costly.
John.
This makes it reasonable to trade off number of temps vs latency buffering, of course why NV's chips suffer so badly when using such a small number of regs seems a bit strange as thsi would imply that they barley have enough buffering to hide their ALU latency, let alone latency of dependent reads (although there are other scheme that can be used to help with them).
On number of ports, If you're supporting par vec and scalor ops then per clock you need up to 6 read and 2 write ports, 8 ports is still costly.
John.