AMD: R9xx Speculation

no-X · Oct 26, 2010

because frame-buffer doesn't contain it?

Bouncing Zabaglione Bros. · Oct 26, 2010

So does anyone know when the Cayman press day and NDA expiration happen?

Dave Baumann · Oct 26, 2010

MLAA applied in the driver is a post process filter, but there is no reason why a dev cannot include it as part of their rendering. All we are doing is taking the final frame buffer data and intercepting it before it goes to the display. A dev can do all their rendering, have an MLAA pass, copy that into the frame buffer than them put the HUD/overlays on top of that.

caveman-jim · Oct 26, 2010

Dave Baumann said:
MLAA applied in the driver is a post process filter, but there is no reason why a dev cannot include it as part of their rendering. All we are doing is taking the final frame buffer data and intercepting it before it goes to the display. A dev can do all their rendering, have an MLAA pass, copy that into the frame buffer than them put the HUD/overlays on top of that.

would that require the driver to expose the capability and control it's application within the engine, not have it enabled via CCC checkbox?

Alexko · Oct 26, 2010

caveman-jim said:
would that require the driver to expose the capability and control it's application within the engine, not have it enabled via CCC checkbox?

I think Dave meant that developers can implement their own version of MLAA, and not rely on the driver at all.

caveman-jim · Oct 26, 2010

That's why I quoted him, to get clarification. ;-)

Gipsel · Oct 26, 2010

Alexko said:
I think Dave meant that developers can implement their own version of MLAA, and not rely on the driver at all.

But an application controllable MLAA pass as suggested by caveman-jim would be quite convenient as the developers wouldn't have to implement their own MLAA filter.

Entropy · Oct 26, 2010

Jawed said:
Makes me wonder if MLAA is done in linear space or gamma space.

As far as I can understand, it should take place in gamma space. Or rather, take gamma into account one way or the other as it evaluates contrast. If I don't misremember, doing related work in gamma space rather than linear was an ATI feature way back in R300 days. I doubt they would make trivial mistakes today given both their history and general competence. Can you see any hints from the screenshots that it works in linear?

Mianca · Oct 27, 2010

MLAA is exciting and all - but is there any news from the Cayman front?

leoneazzurro · Oct 27, 2010

Mianca said:
MLAA is exciting and all - but is there any news from the Cayman front?

There are some rumored performance numbers from here:

http://vga.zol.com.cn/195/1953715.html

But I don´t know how real they could be.

Kaotik · Oct 27, 2010

leoneazzurro said:
There are some rumored performance numbers from here:

http://vga.zol.com.cn/195/1953715.html

But I don´t know how real they could be.

Aren't those the one that circled around since yesterday as 6900 numbers, and some month or two ago as 6800 numbers?

leoneazzurro · Oct 27, 2010

Kaotik said:
Aren't those the one that circled around since yesterday as 6900 numbers, and some month or two ago as 6800 numbers?

I don´t know if they were atttributed to 6800, I don´t remember an "X12000" score for Barts even in the most over-optimistic rumor anyway. But yes, they started yesterday as 6970 numbers.

trinibwoy · Oct 27, 2010

Gipsel said:
But an application controllable MLAA pass as suggested by caveman-jim would be quite convenient as the developers wouldn't have to implement their own MLAA filter.

Yep, they could do what nVidia did with CSAA and link it to one of DirectX's AA quality levels - http://developer.nvidia.com/object/coverage-sampled-aa.html. Of course devs would have to explicitly check for it just as they do for CSAA now.

ZerazaX · Oct 27, 2010

leoneazzurro said:
I don´t know if they were atttributed to 6800, I don´t remember an "X12000" score for Barts even in the most over-optimistic rumor anyway. But yes, they started yesterday as 6970 numbers.

He means that a month ago, Cayman was thought to be 6800's and that was what the name was put on the screenshot, but they were edited to 6900

trinibwoy · Oct 27, 2010

mczak said:
I don't think they are really much more efficient. Earlier results showed Cypress scaled a lot better with clocks than additional shader units. And that's exactly what the HD68xx are - less shader units but higher clock. The tweaks for tesselation are definitely helping too but that's about it imho. A bit more efficient per area too because of the cut-down MC PHY, lack of DP etc. but in the grand scheme of things nothing drastic (not that this is necessarily a bad thing).

We were a little bit surprised to see Demers claiming rather large gains in performance per chip area for Barts versus Cypress, on the order of 25%, given that the two chips share the same underlying architecture and are made on the same fabrication process, but that's precisely what happened during the press event for this product. Strangely, the comparison being made was between the Radeon HD 6870—a fully enabled Barts chip running at peak clock speeds—and the Radeon HD 5850—a partially disabled Cypress variant with lower clocks. I also run faster than Usain Bolt if you cut off one of his legs below the knee, but that's not something I like to advertise.

http://www.techreport.com/articles.x/19844

leoneazzurro · Oct 27, 2010

Techreport quote was funny, but the same is true even comparing 6870 and 5870: according to die size Barts has a 30% advantage and, let say, 5-10% lower performance in the average. Alsp perf/W is improved. So maybe AMD was comparing the 6870 to the 5850 only because it´s not so good to say that 5870 is indeed faster. Also could be true that Cypress was indded a not so well balanced chip, given the actual game workloads. It would be interesting to see what is impairing Cypress scaling...

mczak · Oct 27, 2010

leoneazzurro said:
Techreport quote was funny, but the same is true even comparing 6870 and 5870: according to die size Barts has a 30% advantage and, let say, 5-10% lower performance in the average. Alsp perf/W is improved. So maybe AMD was comparing the 6870 to the 5850 only because it´s not so good to say that 5870 is indeed faster. Also could be true that Cypress was indded a not so well balanced chip, given the actual game workloads. It would be interesting to see what is impairing Cypress scaling...

Barts is only 24% smaller than Cypress, with 5-10% lower performance. At the same clock it would be more like 10-15% lower performance. So really, factor in the left out stuff (DP, MC built for lower clock, second CF connector) and that explains pretty much everything (of course, leaving out all that stuff as well as the higher clock leads to performance gains per area). Perf/W is a bit better but not that much neither (ok depends on which card you compare - HD6850 vs. HD5830 is a lot better, HD6870 vs. HD5870 too but HD6870 vs. HD5850 - not so much). The tweaks here and there are certainly nice (most notably leading to better tesselation performance), but there's nothing magic about those gains per area at all.

leoneazzurro · Oct 27, 2010

mczak said:
Barts is only 24% smaller than Cypress, with 5-10% lower performance. At the same clock it would be more like 10-15% lower performance. So really, factor in the left out stuff (DP, MC built for lower clock, second CF connector) and that explains pretty much everything (of course, leaving out all that stuff as well as the higher clock leads to performance gains per area). Perf/W is a bit better but not that much neither (ok depends on which card you compare - HD6850 vs. HD5830 is a lot better, HD6870 vs. HD5870 too but HD6870 vs. HD5850 - not so much).

Cypress is 334 mm2, Barts is 255. 334/255 is 1,31=31% bigger if we consider Barts as the baseline. Otherwise you have to divide 90+% of the performance for 76% of area, that is, about a 20% performance improvement.
And I´m not saying that these are not the reasons, only that AMD claims could be not so far from the reality (anyway, I think there are also some other tweaks to the architecture).
Performance/W should be compared with cards at the same clock levels, because there are the voltage tweaks required to run the cards at higher clocks influencing a lot the power consumption, so really the 6850 should be compared to the 5850 and 6870 to the 5870. The 5830 is a limit case and I would not consider it.
And if we look at the Anand´s chart

http://www.anandtech.com/show/3987/...enewing-competition-in-the-midrange-market/20

the 6870 is delivering more performance for the same power of the 5850 (as AMD said).
Being on the same 40 nm process of the 58xx series, I´d say that these are good results (or that the first iteration of 40 nm process had its share of problems, if you like).

GZ007 · Oct 27, 2010

We were a little bit surprised to see Demers claiming rather large gains in performance per chip area for Barts versus Cypress, on the order of 25%, given that the two chips share the same underlying architecture and are made on the same fabrication process, but that's precisely what happened during the press event for this product. Strangely, the comparison being made was between the Radeon HD 6870—a fully enabled Barts chip running at peak clock speeds—and the Radeon HD 5850—a partially disabled Cypress variant with lower clocks. I also run faster than Usain Bolt if you cut off one of his legs below the knee, but that's not something I like to advertise.

Maybe the author cant compare more than 2 things at once

(Hint : die are , power consumption and performance).

mczak · Oct 27, 2010

leoneazzurro said:
Cypress is 334 mm2, Barts is 255. 334/255 is 1,31=31% bigger if we consider Barts as the baseline. Otherwise you have to divide 90+% of the performance for 76% of area, that is, about a 20% performance improvement.

Well you started with the inconsequent usage of percentage numbers

For size you took Barts as baseline (Cypress 30% larger) but for performance you took Cypress as baseline (Barts cards 5-10% slower)... Admittedly it makes more difference for size than performance...

And I´m not saying that these are not the reasons, only that AMD claims could be not so far from the reality (anyway, I think there are also some other tweaks to the architecture).

Really, as far as performance goes I can't see many other tweaks. I think it also depends what you consider a "big" increase. As far as I can tell it's the result of removal of unneeded features, simds (which don't scale too well), and higher clock, not any magic changes (except for tesselation). Maybe it's also a bit more densely packed (due to process improvements?) since die size was reduced a bit more than transistor count.

Performance/W should be compared with cards at the same clock levels, because there are the voltage tweaks required to run the cards at higher clocks influencing a lot the power consumption, so really the 6850 should be compared to the 5850 and 6870 to the 5870. The 5830 is a limit case and I would not consider it.
And if we look at the Anand´s chart

http://www.anandtech.com/show/3987/...enewing-competition-in-the-midrange-market/20

the 6870 is delivering more performance for the same power of the 5850 (as AMD said).
Being on the same 40 nm process of the 58xx series, I´d say that these are good results (or that the first iteration of 40 nm process had its share of problems, if you like).

These aren't bad results (I don't like those wall measurements if you want to convince me try other numbers), but again the efficiency just isn't that much higher. For the HD6850 vs. HD5850 you get like 20% lower power draw for 10% less performance (though it seems to depend on card as it looks like some have much higher voltage than others). Which definitely IS a best-in-class result, it's just not THAT much better than the old one.

AMD: R9xx Speculation

no-X

Bouncing Zabaglione Bros.

Dave Baumann

Gamerscore Wh...

caveman-jim

Alexko

caveman-jim

Gipsel

Entropy

Mianca

leoneazzurro

Kaotik

Drunk Member

leoneazzurro

trinibwoy

Meh

ZerazaX

trinibwoy

Meh

leoneazzurro

mczak

leoneazzurro

GZ007

mczak

Similar threads