| Mintmaster |
09-May-2007 22:04 |
Quote:
Originally Posted by Jawed
(Post 983464)
And what's annoying me is that you think ATI has never written a co-issuing compiler ever before. R300 is a 4-issue ALU. R600 is 5-issue. R300's four instructions are all different (either in component count or capability or both). 4 of R600's ALUs are identical. The sky isn't falling in.
|
Yet AGAIN you're missing the point. Nobody ever said ATI can't do it. DemoCoder simply asserted that it's more important to be good at it now.
Quote:
1-clock instructions are extremely costly...
|
I still don't agree with this unless you're talking about latency. Changing ops each clock isn't hard as long as you don't need the result immediately. Nearly every pipelined processor in the world, regardless of how simple, does this. You don't save much space by increasing this to more clocks.
Quote:
I'm glad the penny's dropped. This is just one way that your sequencer complexity increases.
If instead of combining a 64-pixel ALU with a single-clock instruction pipeline your sequential scalar GPU has 16-pixel ALUs and four-clock instructions, you've still got increased sequencer complexity compared against R600, because you've just multiplied the number of batches in flight 4x in order to retain the same batch size (this is the "four Xenoses glued together" scenario).
So, now muse on how much of G80 is batch sequencing logic, since it has 16 batches in flight and compare that against the 4 batches in R600.
|
I think one of the reasons we're having difficulty communicating is differing terminology when you say "batches in flight". Before G80 came it was 512 for R5xx and 6 (albeit enormous ones) for G70. Now you're talking about batches in the immediate vicinity of the ALU arrays, ignoring all the other batches in the pipeline.
Simple cycling between batches for predictable ALU instructions isn't hard, and doesn't add measurably to the sequencer complexity. The tough task is managing the many threads in flight that are waiting for texture fetch results. I don't consider what you're talking about to be sequencer complexity. I think the complexity arises from the larger data pool that each stage of the ALUs need to select from.
G80 may have more batches in flight in this sense, but the only reason for it to have more total batches in flight is the higher texture throughput.
|