AMD: R9xx Speculation

Karoshi · Oct 14, 2010

jaredpace said:

Would anyone with half a clue more than I have care to comment on power distribution wrt. cypress? TIA

Triskaine · Oct 14, 2010

Gipsel said:
But still, 160 or 192 VLIW units (pro and XT => 800/960 SPs with xyzwt or only 640/768 with xyzt) appears to be quite on the low side to reach close to Cypress performance. They really need to have widened some bottlenecks to get that performance.

Charlie's original rumour that [STRIKE]Southern[/STRIKE]Northern Islands would have a Northern Islands Frontend mated to an Evergreen shadercore seems to be the true one afterall.

PSU-failure · Oct 14, 2010

Jawed said:
As far as I can tell the throughput for XYZT would be the same as XYZWT in all three of these tests.

But I think it rules out XYZW with emulated transcendentals.

What about a possible split + distributed T lane over the remaining units?

Karoshi said:
Would anyone with half a clue more than I have care to comment on power distribution wrt. cypress? TIA

It uses the typical "slot power only" arrangement, the small tab in the slot being power.

It's strange for a board with 2 additional power headers, but it's probably not stupid. It could lead to better grounding of some areas/components of the board for example.

Jawed · Oct 14, 2010

mczak said:
If Barts is really 2 RPE and Cayman 3 RPE (whatever those actually are...), pretty much no matter how I look at it Cayman would be barely 50% larger than Barts.

RPE is a convincing sounding name, but the only time we've seen it is on the slide where Barts is shown clearly with Caicos and Cayman blurred.

That slide has 320(x4) for Barts. The reported GPUBench results (for Barts Pro) indicate Barts XT is likely to be 192(x5)=960 (or 192(x4)=768?). Obviously that's just another semi-random posting. The person who posted it, posts as Kepler on ChipHell. I've seen him post before on there, but I can't find what he posted before. But I have a vague memory of it being credible.

Now, if Barts is indeed 230mm² that would put Cayman at - nearly exactly the same die size as Cypress...

I estimate Barts is 13.7x17.3=237mm². 50% larger than that would make Cayman 356mm².

Alexko · Oct 14, 2010

Triskaine said:
Charlie's original rumour that [STRIKE]Southern[/STRIKE]Northern Islands would have a Northern Islands Frontend mated to an Evergreen shadercore seems to be the true one afterall.

It looks that way. I guess Charlie got all the facts, just all shuffled around…

That's probably good news, though, since it means we're in for a treat with Southern Islands: 28nm and 4-way shaders at the same time: sounds like a pretty nice cumulative performance/mm² improvement!

Bouncing Zabaglione Bros. · Oct 14, 2010

Mintmaster said:
In what way are they abandoning the sweet spot strategy?

Where did I say that? I specifically said I didn't think they would abandon the sweet-spot and follow Nvidia's lead of a massive, power hungry chip that generates heat and noise and is difficult and expensive to make.

Mintmaster said:
NVidia introduced GF100 first and only got GF104 out 3 months ago. That's what the old strategy looks like. AMD is introducing the sweet spot chip first, and judging by its die size and performance, they're doing exactly what they did with the RV7xx except this time NVidia won't be squeaking out a marginal victory at the $400+ price point.

I was referring specifically to the "Cayman is going to be AMDs biggest chip EVAR! and will chuck out as much heat as GF100" from Fudo. I don't believe that AMD will go back down that road of giant monolithic chips that are difficult to make, have poor yields and need to have huge prices.

The top end with Antilles is going to be a dual chip that has the advantages of smaller/easier/cheaper manufacture, so even at the high end I don't think that AMD will abandon the sweet spot strategy.

Jawed · Oct 14, 2010

Gipsel said:
But still, 160 or 192 VLIW units (pro and XT => 800/960 SPs with xyzwt or only 640/768 with xyzt) appears to be quite on the low side to reach close to Cypress performance. They really need to have widened some bottlenecks to get that performance.

I'm dubious that math is a meaningful bottleneck. Additionally, NVidia's approach to bottlenecks shows that in some games (e.g. Far Cry 2) math is seemingly irrelevant.

The patent documents relating to texturing architecture that I keep mentioning appear to be much more likely to provide a benefit.

Additionally the suckage that is the setup-rasteriser architecture of Cypress might be hampering math/texturing in games too.

AnarchX · Oct 14, 2010

Texture architecture seems really a point where was/is potential:

AnarchX said:
What do you think about an increase in practical texture fill-rate on NI-chips?

The situation now:

http://techreport.com/articles.x/19242/6

So if AMD is able to reach the same per texel real-time performance as on Juniper, Barts and Cayman may reach the following performance numbers:

Barts XT (64 TMUs @ 900MHz?): ~28,9 GTex/s -> 5870+11%

Cayman XT (96 TMUs @ 900MHz?): ~43,4 GTex/s -> 5870+67%

If scaling is linear from Juniper (40 TMUs @ 850MHz) to speculated Barts XT (48 TMUs @ 900MHz), 21 GTex/s could be reached. A bit above HD 5850.

Jawed · Oct 14, 2010

PSU-failure said:
What about a possible split + distributed T lane over the remaining units?

The reported tests don't seem to provide any insight on XYZT versus XYZWT, so I can't see how they could provide any insight on other combinations of lanes to achieve trascendentals - so I'm not even going there (there was a discussion of these possible alternative setups back in April I think it was).

Gipsel · Oct 14, 2010

AnarchX said:
Texture architecture seems really a point where was/is potential:

But if Barts is really only a 12 SIMD chip with the traditional Evergreen layout, it has not 64 TMUs but Barts XT has just 48 and Barts Pro only 40, same number as Juniper XT. I guess that is only going to work, if the TMUs itself are somewhat changed as patents mentioned by Jawed suggest. It would be nice to see a higher L1 cache bandwidth as it would enable to sustain a higher trilinear and aniso filtering speed (and quality btw.) compared to bilinear filtering like observed with GF100/104.

AnarchX · Oct 14, 2010

Yes, 48 TMUs seem very likely.
If Northern Islands TMUs reach the same realtime-to-peak-ratio like GF100, Barts XT(48TMUs@900MHz) could reach 26,1GTex/s and even beat HD 5870 in this test.

Jawed · Oct 14, 2010

One of the patent documents talks about the way fixed-precision hardware is used to perform filtering, with the operands having their exponents aligned by brute-force, throwing away precision. I don't know much about filtering math, but I wonder if this approach on existing hardware is capable of causing the filtering problems we see, even when dealing with 8-bit texels.

In effect I'm wondering if it's possible that after a few iterations for higher-degrees of anisotropic filtering, the precision loss could be quite severe.

LordEC911 · Oct 14, 2010

jaredpace said:

Well we already knew that Barts was on a 5850 PCB, diff components, I heard and posted that a long time ago, shortly after the "pin to pin compatiability" rumor.

Edit- Guess it wasn't that long ago, posted it here at the beginning of Sept, though I thought I had heard it back in August.

SimBy said:
Well by the looks of it, earlier measurements of 230mm^2 seem spot on.

I thought so too by just eyeing it.

Unknown Soldier said:
I thought Bart had already been measured?

Oh, didn't see that. imageshack pics don't show up at work unfortunately. Well at least I tagged it for later, thanks.
The only one I had seen was a few days ago but that was just the back of the PCB, not neccessarily accurate.

Kaotik · Oct 14, 2010

LordEC911 said:
Well we already knew that Barts was on a 5850 PCB, diff components, I heard and posted that a long time ago, shortly after the "pin to pin compatiability" rumor.

It's not 5850 PCB, it's just same sized, other than size the PCB layouts are completely different

wishiknew · Oct 14, 2010

I like the better placement of the power connectors.

AnarchX · Oct 14, 2010

Could it be possible that only Cayman has 4D VLIW ALUs?

I am thinking about SIMDs with 128SPs (32 4D-ALUs), which allows the wavefront of 64 like in today GPUs and in a 5D ALU Barts.
These SIMDs are combined with enhanced TMUs (maybe one quad-TMU or two like in Jaweds patents) which reach the performance of GF100 TMUs and may offer FP16 fullspeed.
So 15 of these SIMD would give us 1920SPs and 60 TMUs.

This would put Cayman @ 900MHz 28% over the GTX 480 in 16xAF texturing and offer HPC crunching power of 3.4 TFLOPs.

hoom · Oct 14, 2010

So AMD sees Nvidia do Epic Fail with GF100, and decides to abandon their hugely successful sweet-spot strategy after one generation and follow the same route as Nvidia with a giant 300 watt single chip that bleeds heat like a nuclear furnace?

What I got out of this article by Anand is that there is a good chance that this generation may include something really big even though the Sweetspot strategy had been being successful.

trinibwoy · Oct 14, 2010

There's absolutely no reason for AMD to shy away from big chips given their current perf/w. People talk about "sweet-spot" as if it's some ideal. They did what they did with 3870 out of necessity. With RV770 they didnt have the power headroom. Now they do.

Man from Atlantis · Oct 15, 2010

Barts has more than 960 shaders and it's certain

keritto · Oct 15, 2010

onethreehill said:
HD 6850 is based on AMD's new Barts GPU, built on the 40 nm process. The source mentions that the SKU will have 800 stream cores enabled, from earlier reports we're lead to believe that these stream cores are individually more complex than AMD's traditional 5D (4 simple + 1 complex) approach to unified shaders.

Would Barts be 20SIMD-pack, or just in good old R600 legacy manner 16SIMD-pack after all? 800 of 960 shaders will make much larger impact, and this chip should be tiny <230mm2 so why disable "half of the chip" when you can make 960/880 or similar ratio without too much fuss?! RV770 had similar size, give or take few mm2, and the most of them were born as fully functional dies

AMD: R9xx Speculation

Karoshi

Triskaine

PSU-failure

Jawed

Alexko

Bouncing Zabaglione Bros.

Jawed

AnarchX

Jawed

Gipsel

AnarchX

Jawed

LordEC911

Kaotik

Drunk Member

wishiknew

AnarchX

hoom

trinibwoy

Meh

Man from Atlantis

keritto

Similar threads