WRT to the 70%: Do you mean HD 6970 vs. HD 5830? Or fully blown (and clocked) versions respectively?
I highly doubt anyone would use the 5830 as any definition of Cypress, least of all Jawed. The 5850 perhaps, but most likely the 5870.
WRT to the 70%: Do you mean HD 6970 vs. HD 5830? Or fully blown (and clocked) versions respectively?
Hehe, sorry bud. Don't have time nowadays. Maybe I'll give it a shot during the Christmas break.Plenty of speculation on that topic if you care to rummage. I wonder how Mintmaster's tessellation experiments are coming along
Bulking up on register files is a poor way of reducing avg. texturing latency, especially when you need only a few K (~10K L1) but you need (~50K reg file increase) to make any noticeable impact on latency.I think there's little motivation for AMD to massively invest in the L1/L2 structure because of their (comparatively) massive register files. For conventional workloads this approach seems to work just fine, as long as you have more math to hide memory subsystem accesses. Fermi on the contrary needs it's elaborate memory subsystem partly to compensate it's smaller register file.
There's a degree of truth in all that: VLIW chomps through addresses in register files at a relatively high rate, after all. (Though NVidia's register file architecture is barely behind once you take the banking into account.) And VLIW tends to want more temp registers, though the pipeline registers and static temp registers (there's up to 8 of them per work item in Evergreen) in ATI both negate a substantial portion of that.I think there's little motivation for AMD to massively invest in the L1/L2 structure because of their (comparatively) massive register files. For conventional workloads this approach seems to work just fine, as long as you have more math to hide memory subsystem accesses. Fermi on the contrary needs it's elaborate memory subsystem partly to compensate it's smaller register file.
WRT to the 70%: Do you mean HD 6970 vs. HD 5830? Or fully blown (and clocked) versions respectively?
Jawed If your prediction is correct then Cayman will outperform GTX 580 by 15-20% .
Another leaked slide : http://forums.overclockers.co.uk/showpost.php?p=17962524&postcount=627
You mean another fake?
How could RV770 be only 33% bigger than RV670 but almost 100% faster?I can't understand, how these people think. How could a GPU (which is only +50% bigger than super-efficient Barts) be +100% faster?
MSI Radeon HD 6950 & HD 6970 listed in France
http://www.tcmagazine.com/tcm/news/hardware/32290/msi-radeon-hd-6950-hd-6970-listed-france
It's design choices as always. Some people tend to tend to one way, other people to another way. It seems to have worked pretty well at least up until now.Bulking up on register files is a poor way of reducing avg. texturing latency, especially when you need only a few K (~10K L1) but you need (~50K reg file increase) to make any noticeable impact on latency.
Also, their reg file isn't that big considering the alu's they have.
Hehe...There's a degree of truth in all that: VLIW chomps through addresses in register files at a relatively high rate, after all. (Though NVidia's register file architecture is barely behind once you take the banking into account.) And VLIW tends to want more temp registers, though the pipeline registers and static temp registers (there's up to 8 of them per work item in Evergreen) in ATI both negate a substantial portion of that.
But I think an enhanced L1/L2 architecture is inevitable because:
- tessellation - writing data off die, always (seems likely in Cayman), un-cached, seems lunatic, particularly as tessellation is, partly, intended to be a way of saving bandwidth.
- compute - global buffer read/write is a fact of life now and arbitrary producer-consumer depends upon it. Compute is part of graphics. No excuses.
- register spill - can't be avoided with the hairiest kernels. Though I think the current compiler throws its hands up in horror and resorts to spill far too easily.
HD5830? What's that?