AMD: R9xx Speculation

Excellent news Dave, thanks for the clarification on support. Great to hear that we can expect support for XP and DX10/11 games.
 
R700 has LDS too. I guess that MLAA is only using LDS for shared reads (owner-writes model) - on that basis R700 performance would be comparably accelerated.
 
R700 has LDS too. I guess that MLAA is only using LDS for shared reads (owner-writes model) - on that basis R700 performance would be comparably accelerated.

Hmm. I remembered RV770 having LDS problems... Guess my memory was off.

Thanks for the info.
 
Thanks Dave, much appreciated. Could you tell us what AMD plans to do with MLAA and text issues? There are various workaround ideas floating, but maybe its too early stage to speak about it?
 
Hmm. I remembered RV770 having LDS problems... Guess my memory was off.
R700's LDS problem is lack of support for generic write, i.e. any work item writing to any location in the work group's allocation of memory (only the "owner" of each distinct range of memory can write to it). It's my guess that this algorithm doesn't require generic write.

There's a subtler problem to do with the shape and size of work groups in R700 (1D, 64 work items) but I can't remember the details. OpenCL and Direct Compute 4.0/4.1 handle this differently I think and the underlying architecture is slightly more generic but still restricted. Stuff I ignore these days, frankly.

There is another factor in play: LDS reads are at the same bandwidth as TEX reads in R700. This is ~half the relative speed seen in Evergreen's LDS. This might mean the algorithm doesn't appreciably benefit from LDS on R700 and it would be best simply to optimise for L1 cache coherency instead.
 
R700's LDS problem is lack of support for generic write, i.e. any work item writing to any location in the work group's allocation of memory (only the "owner" of each distinct range of memory can write to it). It's my guess that this algorithm doesn't require generic write.

There's a subtler problem to do with the shape and size of work groups in R700 (1D, 64 work items) but I can't remember the details. OpenCL and Direct Compute 4.0/4.1 handle this differently I think and the underlying architecture is slightly more generic but still restricted. Stuff I ignore these days, frankly.

There is another factor in play: LDS reads are at the same bandwidth as TEX reads in R700. This is ~half the relative speed seen in Evergreen's LDS. This might mean the algorithm doesn't appreciably benefit from LDS on R700 and it would be best simply to optimise for L1 cache coherency instead.

Lack of generic write? Damn. I suppose it wasn't thought to be a problem back when RV7xx was being developed...

Hmm. Didn't know the speed of the LDS reads. Thanks.

And thank you for explaining all this to me. I have a poor memory, and finding this type of info isn't always the easiest here(It tends to all be dumped into one thread... Hard to find stuff in that format).
 
I share your disappointment IST...as was mentioned earlier...HD6800 are just binned chips...being cynical...one can call HD6870 as nothing more than a hacked down 5850, overvolted and overclocked to 900mhz...i expect 5850 overclock to 900mhz would provide better results...as the two are arch similar..im very curious what "real" arch improvement cayman will have to justify the HD69xx renaming...strangely...i maybe even just get the 5850 for cheap to replace the heaty 4870....one year late...i am...
 
I share your disappointment IST...as was mentioned earlier...HD6800 are just binned chips...being cynical...one can call HD6870 as nothing more than a hacked down 5850, overvolted and overclocked to 900mhz...i expect 5850 overclock to 900mhz would provide better results...as the two are arch similar..im very curious what "real" arch improvement cayman will have to justify the HD69xx renaming...strangely...i maybe even just get the 5850 for cheap to replace the heaty 4870....one year late...i am...

I would say the HD 6xxx series is much more efficient than the HD 5xxx series. In many cases, it nearly beats or beats the cards they're replacing with much lower specs.

This means one of two things.

1. They optimized the hell out of the drivers, and there's not much performance left in the chip.

2. They made each unit/enough of the units in the chip more efficient than the HD 5xxx series, and there's a lot of performance left to wring out in the drivers.

I would place my bets on 2. It's certainly within AMD's talent pool/power to pull the efficiency gains off.

If 2 is true, there's a lot of performance left in the chips, and basically despite being a "weaker" chip on paper, it'll wind up surpassing the cards it replaces. This isn't unprecedented. The cards based off of the G92/G92b chip did that, and the cards based off of Juniper wound up being faster than the HD 48xx series too.
 
I would say the HD 6xxx series is much more efficient than the HD 5xxx series. In many cases, it nearly beats or beats the cards they're replacing with much lower specs.
I don't think they are really much more efficient. Earlier results showed Cypress scaled a lot better with clocks than additional shader units. And that's exactly what the HD68xx are - less shader units but higher clock. The tweaks for tesselation are definitely helping too but that's about it imho. A bit more efficient per area too because of the cut-down MC PHY, lack of DP etc. but in the grand scheme of things nothing drastic (not that this is necessarily a bad thing).
 
Agreed...you are comparing a 725mhz to 900mhz...and even that beats it out by less than 8fps on average (with lowered AF defaults!)....the disappointing thing ....the tessellation improvements...seem more like very small hardware tweaks...very small...in the grand scheme...the move to HD6870 now all seemed smaller than moving to HD4890..at least that one AMD moved up the clock without removing sp...even added some more trannies to help overclocking!
 
Last edited by a moderator:
With Star Wars Force Unleashed II released today will AMD need to release a profile update for DLAA (directional Localized AA) for both new and older cards? It's not clear if that will be used on the PC but is used on the console. Makes me wonder where is DLAA and MLAA coming from Sony, MS, independents or others?
 
Last edited by a moderator:
I don't think they are really much more efficient. Earlier results showed Cypress scaled a lot better with clocks than additional shader units. And that's exactly what the HD68xx are - less shader units but higher clock. The tweaks for tesselation are definitely helping too but that's about it imho. A bit more efficient per area too because of the cut-down MC PHY, lack of DP etc. but in the grand scheme of things nothing drastic (not that this is necessarily a bad thing).

I forgot about the higher clocks. >_<

Disregard what I said. I suck.
 
1. They optimized the hell out of the drivers, and there's not much performance left in the chip.
Optimizations will apply to both Barts and Cypress, so at a generical level both benefit equally, other than tessellation differences.

Agreed...you are comparing a 725mhz to 900mhz...and even that beats it out by less than 8fps on average (with lowered AF defaults!)....the disappointing thing ....the tessellation improvements...seem more like very small hardware tweaks...very small...in the grand scheme...the move to HD6870 now all seemed smaller than moving to HD4890..at least that one AMD moved up the clock without removing sp...even added some more trannies to help overclocking!

Why are you even comparing RV790 here? That was designed as a performance improve to sit at a higher position in the stack as the rest of the products. Design goals for Barts are completely different (i.e. to sit at a very different position in the product stack to Cypress).
 
It doesn't distinguish between any type of edge (geometry, texture, alpha or not), specifically. It it is a post process routine that looks for high contrast edge patterns and samples from surrounding pixels. I think that high frequency noise that differs from pixel to pixel the algorithm may not be able to pick up that that is an edge and may not filter it.
 
So will it be possible for devs to exclude certain parts of the image in the future? The funky looking health squares in SC2 and other UI elements come to mind.
 
Back
Top