AMD: R9xx Speculation

jimbo75 · Dec 9, 2010

CarstenS said:
WRT to the 70%: Do you mean HD 6970 vs. HD 5830? Or fully blown (and clocked) versions respectively?

I highly doubt anyone would use the 5830 as any definition of Cypress, least of all Jawed. The 5850 perhaps, but most likely the 5870.

Mintmaster · Dec 9, 2010

Jawed said:
Plenty of speculation on that topic if you care to rummage. I wonder how Mintmaster's tessellation experiments are coming along

Hehe, sorry bud. Don't have time nowadays. Maybe I'll give it a shot during the Christmas break.

rpg.314 · Dec 9, 2010

CarstenS said:
I think there's little motivation for AMD to massively invest in the L1/L2 structure because of their (comparatively) massive register files. For conventional workloads this approach seems to work just fine, as long as you have more math to hide memory subsystem accesses. Fermi on the contrary needs it's elaborate memory subsystem partly to compensate it's smaller register file.

Bulking up on register files is a poor way of reducing avg. texturing latency, especially when you need only a few K (~10K L1) but you need (~50K reg file increase) to make any noticeable impact on latency.

Also, their reg file isn't that big considering the alu's they have.

Shtal · Dec 9, 2010

Just my thoughts!

Why can't single (HD6970) CaymanXT match dual (2X HD6870) BartsXT's in raw performance ??

neliz · Dec 9, 2010

I don't think Antilles will be based on the 6950.

Jawed · Dec 9, 2010

CarstenS said:
I think there's little motivation for AMD to massively invest in the L1/L2 structure because of their (comparatively) massive register files. For conventional workloads this approach seems to work just fine, as long as you have more math to hide memory subsystem accesses. Fermi on the contrary needs it's elaborate memory subsystem partly to compensate it's smaller register file.

There's a degree of truth in all that: VLIW chomps through addresses in register files at a relatively high rate, after all. (Though NVidia's register file architecture is barely behind once you take the banking into account.) And VLIW tends to want more temp registers, though the pipeline registers and static temp registers (there's up to 8 of them per work item in Evergreen) in ATI both negate a substantial portion of that.

But I think an enhanced L1/L2 architecture is inevitable because:

tessellation - writing data off die, always (seems likely in Cayman), un-cached, seems lunatic, particularly as tessellation is, partly, intended to be a way of saving bandwidth.
compute - global buffer read/write is a fact of life now and arbitrary producer-consumer depends upon it. Compute is part of graphics. No excuses.
register spill - can't be avoided with the hairiest kernels. Though I think the current compiler throws its hands up in horror and resorts to spill far too easily.

WRT to the 70%: Do you mean HD 6970 vs. HD 5830? Or fully blown (and clocked) versions respectively?

HD5830? What's that?

whocares · Dec 9, 2010

Jawed If your prediction is correct then Cayman will outperform GTX 580 by 15-20% .

Another leaked slide : http://forums.overclockers.co.uk/showpost.php?p=17962524&postcount=627

no-X · Dec 9, 2010

Alexko · Dec 9, 2010

whocares said:
Jawed If your prediction is correct then Cayman will outperform GTX 580 by 15-20% .

Another leaked slide : http://forums.overclockers.co.uk/showpost.php?p=17962524&postcount=627

You mean another fake?

neliz · Dec 9, 2010

Alexko said:
You mean another fake?

The real graph in the 6900 presentation looks different.

Psycho · Dec 9, 2010

edit: too slow

no-X · Dec 9, 2010

I can't understand, how these people think. How could a GPU (which is only +50% bigger than super-efficient Barts) be +100% faster?

possess · Dec 9, 2010

Gipsel · Dec 9, 2010

no-X said:
I can't understand, how these people think. How could a GPU (which is only +50% bigger than super-efficient Barts) be +100% faster?

How could RV770 be only 33% bigger than RV670 but almost 100% faster?

The rearchitecturing between RV670 => RV770 and Cypress => Cayman will be on a roughly comparable scale or even more pronounced.

I'm not saying it will be that way, but flatly excluding a more significant performance increase would be stupid.

onethreehill · Dec 9, 2010

MSI Radeon HD 6950 & HD 6970 listed in France
http://www.tcmagazine.com/tcm/news/hardware/32290/msi-radeon-hd-6950-hd-6970-listed-france

neliz · Dec 9, 2010

onethreehill said:
MSI Radeon HD 6950 & HD 6970 listed in France
http://www.tcmagazine.com/tcm/news/hardware/32290/msi-radeon-hd-6950-hd-6970-listed-france

mondieu! those are steep prices.

no-X · Dec 9, 2010

Gipsel: They made MSAA resolve working (that was the greatest bottleneck of R6xx's performance), removed full-speed FP16 filtering (which was unused by that time games) and replaced ring-bus by a more-are efficient solution (in fact if you clock R600 and RV770 at the same speed and switch of MSAA, performance difference in many games is almost equal to the amount of additional transistors of the RV770).

Barts isn't a broken GPU, which would be boosted in performance by a simple fix. Barts also doesn't have anything, what would be unused, even the DP support was removed. So, any added feature will decrease it's efficiency, not contrary. If you add DP support, it will increase die-size but it won't increase gaming performance. If you add second geometry engine, it will increase die-size, but it won't affect performance in >90% games, etc. ATI/AMD can hardly make a GPU, which would me more efficient and feature-equiped, than Barts...

Jawed · Dec 9, 2010

Where's that Sad Panda smiley?

Oh well, guess we should start the Southern Islands thread.

CarstenS · Dec 9, 2010

rpg.314 said:
Bulking up on register files is a poor way of reducing avg. texturing latency, especially when you need only a few K (~10K L1) but you need (~50K reg file increase) to make any noticeable impact on latency.

Also, their reg file isn't that big considering the alu's they have.

It's design choices as always. Some people tend to tend to one way, other people to another way. It seems to have worked pretty well at least up until now.

WRT to their register files: I think you don't have to look at the number of ALUs but the average lifetime of a given thread. IIRC, AMDs architecture has shorter latencies than Nvidias, doesn't it?

Jawed said:
There's a degree of truth in all that: VLIW chomps through addresses in register files at a relatively high rate, after all. (Though NVidia's register file architecture is barely behind once you take the banking into account.) And VLIW tends to want more temp registers, though the pipeline registers and static temp registers (there's up to 8 of them per work item in Evergreen) in ATI both negate a substantial portion of that.

But I think an enhanced L1/L2 architecture is inevitable because:

tessellation - writing data off die, always (seems likely in Cayman), un-cached, seems lunatic, particularly as tessellation is, partly, intended to be a way of saving bandwidth.

compute - global buffer read/write is a fact of life now and arbitrary producer-consumer depends upon it. Compute is part of graphics. No excuses.

register spill - can't be avoided with the hairiest kernels. Though I think the current compiler throws its hands up in horror and resorts to spill far too easily.

HD5830? What's that?

Hehe...

But there already is a cache structure in Evergreen and former AMD parts, which seems to adequately handle the most terrible performance pitfalls - except for high tessellation levels. And this supposedly should be changing with Cayman.

chavvdarrr · Dec 9, 2010

possess said:

aaand, that is: :?:

Tell me this is not key for power-regulation :S

AMD: R9xx Speculation

jimbo75

Mintmaster

rpg.314

Shtal

neliz

GIGABYTE Man

Jawed

whocares

no-X

Alexko

neliz

GIGABYTE Man

Psycho

no-X

possess

Gipsel

onethreehill

neliz

GIGABYTE Man

no-X

Jawed

CarstenS

Moderator

chavvdarrr

Similar threads