AMD: R7xx Speculation

Status
Not open for further replies.
It's precisely because of ATi's relatively inefficient architecture that they must cram so many SPs into their chips to extract even a reasonable amount of performance from them (real-world apps, compared to competition).

Maybe but let's assume that GT200 maintains the same relative edge in efficiency over RV770. The fact is that not only did theoretical throughput on RV770 increase more than it did on GT200 but the die is tiny in comparison. So the overall perf/mm^2 and perf/watt advantage should shoot up in AMD's favor.
 
It's precisely because of ATi's relatively inefficient architecture that they must cram so many SPs into their chips to extract even a reasonable amount of performance from them (real-world apps, compared to competition).

The efficiency that matters are performance/transistor(or rather performance/production cost) and performance/watt.

Performance/theoretical peak is pretty much irrelevant (for anything else than an interesting technical discussion).

(absolute performance also matters of cause)
 
I'm curious to see what kind of crazy AA levels the card can pull off. AA probably appears somewhat free because they use it to pad the scheduling. Heck, if a significant number of ALU ops were non-dependent AA instructions they could be padding that efficiency quite nicely.
Are you saying that Z test (for samples) is an ALU program?

As far as I can tell AA resolve is an independent pass that cannot be scheduled alongside other work on R6xx. I originally thought it could be, but was told it can't.

So I'm curious how AA can run alongside other work.

Jawed
 
800 SP, hmm... anyone want to bet for a [much] wider MIMD design, here?

And that number doesn't play nice math with 32 TMUs, or else... :rolleyes:
 
Are you saying that Z test (for samples) is an ALU program?

As far as I can tell AA resolve is an independent pass that cannot be scheduled alongside other work on R6xx. I originally thought it could be, but was told it can't.

So I'm curious how AA can run alongside other work.

Jawed

Treat the AA like a regular texture fetch where it gets cached close to an ALU then start plowing away at the math. Wouldn't be to different than some deferred rendering setups with a really large screen resolution. Besides with programmable AA wouldn't it make sense to start taking jobs away from the ROPs and throwing them in with the ALUs since they're starting to perform roughly the same tasks? If they just look at the instructions and replace any texld instructions with a series of ops that perform AA that should do it.

And while it's extremely doubtful, they may have scrapped the ROPs altogether and replaced them with ALUs and TMUs. It might explain where they're getting the extra space from.
 
Last edited by a moderator:
I dunno why but I am getting that X1800XT to X1900XT feeling here. I could be wrong of course. Interesting none the less.
 
Then it would be slow. If we accept for a fact that RV770 should be 50% more powerful than RV670, 24 TMUs should be a minimum, 32 would be more ideal.
Note that 24 tmus seems impossible with 800 SPs (assuming these are still 5-wide units) since 160 doesn't divide by 24 nicely...
20 tmus are possible (8 clusters with shaders arrays of 20 length). But more likely may be 5 clusters (with 32 units per shader array), which would require 32 tmus, or 10 clusters (with 16 units per shader array), which would only require 16 tmus (but 32 should be doable with that too).
 
Heh, 800SP's and 16TMUs? The internet will explode! :LOL:

Well. I wasnt specifically suggesting that. Just that ATI may have significantly overhauled its shader portion of its architecture again. Going for a really high ALU:pixel/Tex ratio again.

Chris
 
Maybe but let's assume that GT200 maintains the same relative edge in efficiency over RV770. The fact is that not only did theoretical throughput on RV770 increase more than it did on GT200 but the die is tiny in comparison. So the overall perf/mm^2 and perf/watt advantage should shoot up in AMD's favor.

The efficiency that matters are performance/transistor(or rather performance/production cost) and performance/watt.

Performance/theoretical peak is pretty much irrelevant (for anything else than an interesting technical discussion).

(absolute performance also matters of cause)

I don't discount any of this. I was speaking purely from the perspective of functional unit utilization, referring to ATi's traditionally-lower SP utilization rates (particularly in the R6xx vs. G8x/G9x generation).
 
I dunno why but I am getting that X1800XT to X1900XT feeling here. I could be wrong of course. Interesting none the less.

What does this mean?

R520->R580 yielded fantastic results, albeit not in every title.

Edit: read your later post, I don't know about any drastic overhaul... RV670->RV770 was too speedy of a transition for any major architectural changes.
 
Treat the AA like a regular texture fetch where it gets cached close to an ALU then start plowing away at the math. Wouldn't be to different than some deferred rendering setups with a really large screen resolution. Besides with programmable AA wouldn't it make sense to start taking jobs away from the ROPs and throwing them in with the ALUs since they're starting to perform roughly the same tasks? If they just look at the instructions and replace any texld instructions with a series of ops that perform AA that should do it.
This is what I thought when R600 was launched - but it doesn't work this way for AA resolve. AA resolve only runs when the render target is complete and while it's running no other rendering takes place.

Of course it may have changed. Or it may be that sample Z-testing runs as an ALU program concurrently with the rendering. That's why I was curious if you had some confirmation...

And while it's extremely doubtful, they may have scrapped the ROPs altogether and replaced them with ALUs and TMUs. It might explain where they're getting the extra space from.
AA resolve in R6xx doesn't use the TUs - unless the developer writes their own post process shader that fetches samples (colour + Z) and does a resolve.

But, yeah, it would be interesting to see if more AA work gets implemented on the ALUs...

Jawed
 
Status
Not open for further replies.
Back
Top