AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Those aren't GeForce boards and are far more expensive.

I looked around and it's only $2100 :p
was expecting a higher price, in fact the quadro 6000 is more expensive.

maybe they lowered the price, at least I'm sure the very first tesla cards were much more expensive.

the ECC feature is a must if you're going to do serious HPC, there's maybe quite a risk of bits flipping on your gas guzzling gaming card.

off-line 3D rendering : then yes a 7970 6GB looks awesome. but AMD lacks an equivalent to CUDA.
 
Lots too much silicon is required for "just marketing".
I'm sure HD7970 wouldn't have ended up much slower if Tahiti had been designed with a 256bit interface and "just" 28 CUs: HD7950 scales to an average of 95% of HD7970 performance if clocked to 925/2750, and cutting the memory speed of HD7970 by 33% results in about 10% less performance (a little more or less depending on settings and games).

That's about 15% less performance all-in-all. Nothing that couldn't have been adjusted for by a little more aggressive memory and core clocks.

2048 shaders and 3GB of memory really sound much nicer than "just" 1792 shaders and 2GB of memory, though :D
 
I'm sure HD7970 wouldn't have ended up much slower if Tahiti had been designed with a 256bit interface and "just" 28 CUs

In Crysis FPS or in general? Because there's not enough public domain data to warrant the latter inference. The argument can be slightly adjusted and splurged on many many GPU generations now (how was the 4850 versus the 4870, for example? How about GTX {4,5}70 vs {4,5}80 or, even more interestingly, {4,5}60 versus {4,5}80?). IMHO lots of this primarily game or toy "compute" benchmark ignores the fact that:
  • current games tend to be reasonably poor comprehensive leveragers of modern GPUs;
  • hope by both IHVs is that these things end up used for more than Crysis FPS, in scenarios in which they scale (and then you feel that you completely need DP to do Gauss-Jordan elimination (complete win!) and go buy an expensive Firestream / Tesla / something something; note this doesn't yet apply to AMD because they still want to grab some market share in that space in any way they can, so no crippling of Tahiti).
The investement involved with validation of an SKU like the 7970 is hardly justified by just the ability to show round numbers in your slideware. Let's try and give architects and other people involved slightly more credit, eh?
 
Let's try and give architects and other people involved slightly more credit, eh?
I didn't intend to take away any credit from the architects and other tech people at AMD. I actually think they did a pretty amazing job with the entire HD7000 line. Marketing and psychology are pretty important, though.

That's not to say that there's more marketing than mathematics involved in laying out a chip's specs - but I'd be surprised to hear that all that TeraScale, GHZ Edition, and frame buffer talk didn't influence the design goals of a chip at all. Not to speak of making a card actually fit into a specific market segment ...
 
I didn't intend to take away any credit from the architects and other tech people at AMD. I actually think they did a pretty amazing job with the entire HD7000 line. Marketing and psychology are pretty important, though.

That's not to say that there's more marketing than mathematics involved in laying out a chip's specs - but I'd be surprised to hear that all that TeraScale, GHZ Edition, and frame buffer talk didn't influence the design goals of a chip at all. Not to speak of making a card actually fit into a specific market segment ...

Top to bottom GHZ lineup (for GCN). They were so close too.
I wonder if we will ever know why not?
 
I don't now, but ROPs are the odd one out:

HD 7870: 2.560 GFLOPs; 32.000 MPix/s; 80.000 MTex/s; 153.600 MB/s
HD 7970: 3.789 GFLOPs; 29.600 MPix/s; 118.400 MTex/s; 264.000 MB/s

What's the deal with this. I've been seeing posts for about 5 years now asking, "Why doesnt Ati increase the ROP count?" The 16-32 was the giant leap for mankind. If ROP count increases this Mpixel/second figure, why don't the new AMD/ATi GPUs have 64 or more of these things? Doesn't the competition have this many? Is there some issue with them taking up too much space, and not being within the size budgets...
 
RV770 kept the ROP count from RV670 and R600, but doubled the throughput/blending rate for every surface format (except the basic INT8). It's not just the number of units. The more pressing issue for ATi in the past wasn't so much the colour pixel fill-rate, but the limited depth/stencil throughput in the back-end, that they finally addressed in RV770 -- of course, not to the extent of what NV invested in, since G80.
From all the synthetic benchmarks I've seen so far, GCN and the previous two generations, matches pretty well the back-end pixel rate (read, write and blend) to the overall capability of the architecture, i.e. raster scan-out capacity and frame-buffer bandwidth. My only personal complaint is that GCN clearly had the room to double the depth/stencil rate, given the generous BW.
 
Fellix, whats your take on these pixel fillrates? The 7970 has similar to a GTX460, while a GTX580 has 37.8. Is the 7970 actually pushing less megapixels than the 7870? I would think they would have some test silicon or simulations with 64 ROPs to see if there is any practical benefit to it. Just seems out of place like Mianca posted, when you step back and look at it. If this isn't the biggest bottleneck, then what is?
 
Fellix, whats your take on these pixel fillrates? The 7970 has similar to a GTX460, while a GTX580 has 37.8.
Due to the pixel export limitation of the SMs (14 pixel/clock, a GTX560Ti increases that to 16 pixel/clock, clock means the base clock here, not the hot clock, and it is actually only that much for RGBA8 and FP32, all other pixel formats half that value), the GTX460 is nowhere even close to a HD7970 and comparing the older HD5800/6900 series to a GTX480/GTX580 with blending, the nV GPUs just profit from the higher memory bandwidth, not the higher ROP count (besides in some AA cases) due to the same export limitation. NVidia was obviously just too lazy (better: it judged it as an unnecessary cost) to decouple the ROP count from the width of the memory controller, the higher number of ROPs buys them basically no performance.

img0030363vtzkq.gif
img0030364a5a9l.gif

img00345663lb0j.gif
img00345674ixwr.gif
 
Last edited by a moderator:
Note the colour write rates for GTX 580. The count stop short before the scan-out capacity of 32 fragments per clock and far below of what the 48 ROPs are capable.
 
Note the colour write rates for GTX 580. The count stop short before the scan-out capacity of 32 fragments per clock and far below of what the 48 ROPs are capable.
That's the export limitation of the SMs I mentioned. ROP count is no bottleneck with Fermi. It makes no sense to complain that Tahiti should have more ROPs while it decisively beats a GTX580 in all fillrate benchmarks (and the GTX460 jaredpace mentioned is simply no contest).
 
Top to bottom GHZ lineup (for GCN). They were so close too.
I wonder if we will ever know why not?

Timing.

Although there was a relatively short time period between the releases of the chips, Verde and Pitcairn's bring-up, and to some extent qualification, have a reasonable level of leveraging going on so they are a little shortended in terms of initial engineering wafers back to product shipping. Actually setting the product "boundries" for Tahiti happened a while ago, on initial engineering material and few wafers out from the fab; Pitcairn and Verde on the other hand had their product boundries set when Tahiti production starts were already occuring and there is a very quick evolution in terms of understanding things with the new process / chips.

I guess the question you want to ask is whether, now that we know things have evolved, are we going back to re-look at Tahiti.... ;-)
 
Back
Top