AMD: R9xx Speculation

Malo · Oct 26, 2010

Excellent news Dave, thanks for the clarification on support. Great to hear that we can expect support for XP and DX10/11 games.

Jawed · Oct 26, 2010

R700 has LDS too. I guess that MLAA is only using LDS for shared reads (owner-writes model) - on that basis R700 performance would be comparably accelerated.

I.S.T. · Oct 26, 2010

Jawed said:
R700 has LDS too. I guess that MLAA is only using LDS for shared reads (owner-writes model) - on that basis R700 performance would be comparably accelerated.

Hmm. I remembered RV770 having LDS problems... Guess my memory was off.

Thanks for the info.

Harison · Oct 26, 2010

Thanks Dave, much appreciated. Could you tell us what AMD plans to do with MLAA and text issues? There are various workaround ideas floating, but maybe its too early stage to speak about it?

Jawed · Oct 26, 2010

I.S.T. said:
Hmm. I remembered RV770 having LDS problems... Guess my memory was off.

R700's LDS problem is lack of support for generic write, i.e. any work item writing to any location in the work group's allocation of memory (only the "owner" of each distinct range of memory can write to it). It's my guess that this algorithm doesn't require generic write.

There's a subtler problem to do with the shape and size of work groups in R700 (1D, 64 work items) but I can't remember the details. OpenCL and Direct Compute 4.0/4.1 handle this differently I think and the underlying architecture is slightly more generic but still restricted. Stuff I ignore these days, frankly.

There is another factor in play: LDS reads are at the same bandwidth as TEX reads in R700. This is ~half the relative speed seen in Evergreen's LDS. This might mean the algorithm doesn't appreciably benefit from LDS on R700 and it would be best simply to optimise for L1 cache coherency instead.

I.S.T. · Oct 26, 2010

Jawed said:
R700's LDS problem is lack of support for generic write, i.e. any work item writing to any location in the work group's allocation of memory (only the "owner" of each distinct range of memory can write to it). It's my guess that this algorithm doesn't require generic write.

There's a subtler problem to do with the shape and size of work groups in R700 (1D, 64 work items) but I can't remember the details. OpenCL and Direct Compute 4.0/4.1 handle this differently I think and the underlying architecture is slightly more generic but still restricted. Stuff I ignore these days, frankly.

There is another factor in play: LDS reads are at the same bandwidth as TEX reads in R700. This is ~half the relative speed seen in Evergreen's LDS. This might mean the algorithm doesn't appreciably benefit from LDS on R700 and it would be best simply to optimise for L1 cache coherency instead.

Lack of generic write? Damn. I suppose it wasn't thought to be a problem back when RV7xx was being developed...

Hmm. Didn't know the speed of the LDS reads. Thanks.

And thank you for explaining all this to me. I have a poor memory, and finding this type of info isn't always the easiest here(It tends to all be dumped into one thread... Hard to find stuff in that format).

gongo · Oct 26, 2010

I share your disappointment IST...as was mentioned earlier...HD6800 are just binned chips...being cynical...one can call HD6870 as nothing more than a hacked down 5850, overvolted and overclocked to 900mhz...i expect 5850 overclock to 900mhz would provide better results...as the two are arch similar..im very curious what "real" arch improvement cayman will have to justify the HD69xx renaming...strangely...i maybe even just get the 5850 for cheap to replace the heaty 4870....one year late...i am...

I.S.T. · Oct 26, 2010

gongo said:
I share your disappointment IST...as was mentioned earlier...HD6800 are just binned chips...being cynical...one can call HD6870 as nothing more than a hacked down 5850, overvolted and overclocked to 900mhz...i expect 5850 overclock to 900mhz would provide better results...as the two are arch similar..im very curious what "real" arch improvement cayman will have to justify the HD69xx renaming...strangely...i maybe even just get the 5850 for cheap to replace the heaty 4870....one year late...i am...

I would say the HD 6xxx series is much more efficient than the HD 5xxx series. In many cases, it nearly beats or beats the cards they're replacing with much lower specs.

This means one of two things.

1. They optimized the hell out of the drivers, and there's not much performance left in the chip.

2. They made each unit/enough of the units in the chip more efficient than the HD 5xxx series, and there's a lot of performance left to wring out in the drivers.

I would place my bets on 2. It's certainly within AMD's talent pool/power to pull the efficiency gains off.

If 2 is true, there's a lot of performance left in the chips, and basically despite being a "weaker" chip on paper, it'll wind up surpassing the cards it replaces. This isn't unprecedented. The cards based off of the G92/G92b chip did that, and the cards based off of Juniper wound up being faster than the HD 48xx series too.

mczak · Oct 26, 2010

I.S.T. said:
I would say the HD 6xxx series is much more efficient than the HD 5xxx series. In many cases, it nearly beats or beats the cards they're replacing with much lower specs.

I don't think they are really much more efficient. Earlier results showed Cypress scaled a lot better with clocks than additional shader units. And that's exactly what the HD68xx are - less shader units but higher clock. The tweaks for tesselation are definitely helping too but that's about it imho. A bit more efficient per area too because of the cut-down MC PHY, lack of DP etc. but in the grand scheme of things nothing drastic (not that this is necessarily a bad thing).

gongo · Oct 26, 2010

Agreed...you are comparing a 725mhz to 900mhz...and even that beats it out by less than 8fps on average (with lowered AF defaults!)....the disappointing thing ....the tessellation improvements...seem more like very small hardware tweaks...very small...in the grand scheme...the move to HD6870 now all seemed smaller than moving to HD4890..at least that one AMD moved up the clock without removing sp...even added some more trannies to help overclocking!

ECH · Oct 26, 2010

With Star Wars Force Unleashed II released today will AMD need to release a profile update for DLAA (directional Localized AA) for both new and older cards? It's not clear if that will be used on the PC but is used on the console. Makes me wonder where is DLAA and MLAA coming from Sony, MS, independents or others?

I.S.T. · Oct 26, 2010

mczak said:
I don't think they are really much more efficient. Earlier results showed Cypress scaled a lot better with clocks than additional shader units. And that's exactly what the HD68xx are - less shader units but higher clock. The tweaks for tesselation are definitely helping too but that's about it imho. A bit more efficient per area too because of the cut-down MC PHY, lack of DP etc. but in the grand scheme of things nothing drastic (not that this is necessarily a bad thing).

I forgot about the higher clocks. >_<

Disregard what I said. I suck.

Dave Baumann · Oct 26, 2010

I.S.T. said:
1. They optimized the hell out of the drivers, and there's not much performance left in the chip.

Optimizations will apply to both Barts and Cypress, so at a generical level both benefit equally, other than tessellation differences.

gongo said:
Agreed...you are comparing a 725mhz to 900mhz...and even that beats it out by less than 8fps on average (with lowered AF defaults!)....the disappointing thing ....the tessellation improvements...seem more like very small hardware tweaks...very small...in the grand scheme...the move to HD6870 now all seemed smaller than moving to HD4890..at least that one AMD moved up the clock without removing sp...even added some more trannies to help overclocking!

Why are you even comparing RV790 here? That was designed as a performance improve to sit at a higher position in the stack as the rest of the products. Design goals for Barts are completely different (i.e. to sit at a very different position in the product stack to Cypress).

MistaPi · Oct 26, 2010

I get that MLAA also works on alpha textures, but does it also work on shader/spatial aliasing and texture aliasing?

neliz · Oct 26, 2010

MistaPi said:
I get that MLAA also works on alpha textures, but does it also work on shader/spatial aliasing and texture aliasing?

Have you checked the MLAA answer generator?

http://forum.beyond3d.com/showpost.php?p=1486838&postcount=4165

Dave Baumann · Oct 26, 2010

It doesn't distinguish between any type of edge (geometry, texture, alpha or not), specifically. It it is a post process routine that looks for high contrast edge patterns and samples from surrounding pixels. I think that high frequency noise that differs from pixel to pixel the algorithm may not be able to pick up that that is an edge and may not filter it.

shiznit · Oct 26, 2010

So will it be possible for devs to exclude certain parts of the image in the future? The funky looking health squares in SC2 and other UI elements come to mind.

neliz · Oct 26, 2010

shiznit said:
So will it be possible for devs to exclude certain parts of the image in the future? The funky looking health squares in SC2 and other UI elements come to mind.

How shall we put it?

http://forum.beyond3d.com/showpost.php?p=1486838&postcount=4165

I wonder why people have a hard time understanding that currently, MLAA is not upto devs, it's a post-process filter for crying out loud, POST!

no-X · Oct 26, 2010

but with the help of developers e.g. texts could be applied after MLAA processing... :smile:

shiznit · Oct 26, 2010

neliz said:
How shall we put it?

http://forum.beyond3d.com/showpost.php?p=1486838&postcount=4165

I wonder why people have a hard time understanding that currently, MLAA is not upto devs, it's a post-process filter for crying out loud, POST!

So why is the cursor not affected?

AMD: R9xx Speculation

Malo

Yak Mechanicum

Jawed

I.S.T.

Harison

Jawed

I.S.T.

gongo

I.S.T.

mczak

gongo

ECH

I.S.T.

Dave Baumann

Gamerscore Wh...

MistaPi

neliz

GIGABYTE Man

Dave Baumann

Gamerscore Wh...

shiznit

neliz

GIGABYTE Man

no-X

shiznit

Similar threads