AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Sorry if this has been asked before, but how do you think these are gonna perform mining bitcoins?

Is there going to be a performance increase, or maybe a decrease because they are using less more complex shaders?

Thankss

Even though mining is getting less and less viable I'm sure like hell my two HD6970 will be swapped for two top single GPU GCN cards.
With them I hope to get close to 1.5 - 2.1 GH/s
 
Even though mining is getting less and less viable I'm sure like hell my two HD6970 will be swapped for two top single GPU GCN cards.
With them I hope to get close to 1.5 - 2.1 GH/s

Heh, my dedicated mining rig has 4 cheap 5830 pulling 300mh each, plus 2 6850 in CF in my personal comp, if it wasn't for bitcoins i would never have bought all these gear xD

Sorry but what do you mean "GCN cards"? 1.5-2.1 gh/s for 2 cards seems like a LOT, i am thinking this architecture will be more like 6900 series with not so much increase in Gflop pure raw power
 
Heh, my dedicated mining rig has 4 cheap 5830 pulling 300mh each, plus 2 6850 in CF in my personal comp, if it wasn't for bitcoins i would never have bought all these gear xD

Sorry but what do you mean "GCN cards"? 1.5-2.1 gh/s for 2 cards seems like a LOT, i am thinking this architecture will be more like 6900 series with not so much increase in Gflop pure raw power


GCN - Graphics Core Next - next generation GPU's from AMD, which is most if not all GPU's from Southern Islands family.

One HD 6970 is pulling around 400MH/s so I expect one top of the line S.I. card to pull at least 750MH/s which brings to CF setup being capable of pulling 1.5GH/s.

This is assuming shaders will at least double again as was the case with previous lithography shrink. 55nm RV770 (800) -> 40nm RV870 (1600).
Transition from 40nm to 28nm should be bigger than usually because TSMC skipped 32nm which should offset larger area required by GCN shader compared to VLIW4.
The only reason to expect less than doubling of shaders in new GPU is if AMD wants to decrease die size of top GPU's to better match sweet spot strategy.
 
GCN is the disruptive, new architecture so I believe it's the generation after sout
hern island.

southern island being a VLIW4 almost the same as a radeon 6970, but at 28nm.
possibly it's not here intending to be a super fast line up of GPU, but it would be a bit like the GT21x line, with a relation toward the first GCN similar to GT21x vs GF100. (or if you wish what GT21x would have been if it was not so area-wasting and hadn't failed :p )
 
GCN is the disruptive, new architecture so I believe it's the generation after sout
hern island.

southern island being a VLIW4 almost the same as a radeon 6970, but at 28nm.
possibly it's not here intending to be a super fast line up of GPU, but it would be a bit like the GT21x line, with a relation toward the first GCN similar to GT21x vs GF100. (or if you wish what GT21x would have been if it was not so area-wasting and hadn't failed :p )

I seem to remember pretty clear statements from AMD about GCN being released this year.
 
The only reason to expect less than doubling of shaders in new GPU is if AMD wants to decrease die size of top GPU's to better match sweet spot strategy.
I'd certainly expect the die area per shader to get bigger due to the overall increased control logic...
 
Yes instead of one sequencer for 10 or 12 SIMDs, which are each 16 work items wide we're looking at one scalar unit (sequencer) for 4 SIMDs, each of which is 16 work items wide.

On the other hand, the closeness of the scalar unit in the new design should mitigate some of the communication/bus overhead of the older design, since the sequencer was effectively sending an entire wodge of instructions/state to each SIMD to enable it to start a clause. It'll be a trickle instead of the bursts of yore.
 
I'd certainly expect the die area per shader to get bigger due to the overall increased control logic...

Yes, that's why I said (not too clearly) in my post - skipping 32nm should somewhat offset increase of (single) shader area for GCN vs VLIW4.

In other words putting 3072 VLIW4 shaders on 32nm hopefully takes similar space on die as 3072 GCN shaders on 28nm. (if it's less then great)
 
Yes, that's why I said (not too clearly) in my post - skipping 32nm should somewhat offset increase of (single) shader area for GCN vs VLIW4.

In other words putting 3072 VLIW4 shaders on 32nm hopefully takes similar space on die as 3072 GCN shaders on 28nm. (if it's less then great)
Ah ok. I still don't see the numbers quite adding up since 2x28^2 is pretty much 40^2 i.e. I can't see AMD fitting twice as many (more complex) shaders on this chip within the same die size as Cayman. And don't forget that Cayman was supposed to be 32nm, so it's probably really larger than what AMD wanted to do.
IOW I think chances of twice as many alus are very slim this time around. But really a ~50% increase there can still double effective performance (my personal bet is still 32 CUs, the same alu number I had in mind pre-gcn, an even smaller increase).
 
Let's not forget that AMD will move from the more "compact" VLIW logic to something that will definitely pack less ALUs per die area. The new architecture will most likely boost at least four setup engines, and we know that parallel geometry processing is not cheap -- it was not in Fermi and certainly won't be for GCN.
 
remember they already have 2 setup engines in cayman and it doesn't seem massively bigger then anything previous assuming shaders took the same or a little more transistors(compared to cypress), the mtu's took a few more trannys and so did the memory controller.

given that it appears that AMD will have a very simple scheduler compared to Fermi will this new arch actually grow in number of transistors per functional unit that much? I dont think we can really use Femri as a basis for comparison of what GCN will be like vs VLIW4.
 
remember they already have 2 setup engines in cayman and it doesn't seem massively bigger then anything previous assuming shaders took the same or a little more transistors(compared to cypress), the mtu's took a few more trannys and so did the memory controller.
It took AMD something like half a billion transistors to add that second primitive setup block in Cayman, given that Cypress had already dual scan-out pipelines already.
 
Ah ok. I still don't see the numbers quite adding up since 2x28^2 is pretty much 40^2 i.e. I can't see AMD fitting twice as many (more complex) shaders on this chip within the same die size as Cayman.
It's not as simple. ALUs (unified core) take less than 1/2 of the die (I'd say it can be closer to 1/3 than 1/2) and I doubt they'll double number of TMUs, ROPs and memory controller as well. Enlarging the non-shader parts by (let say) 50%, you'd still have sufficient space to fit 3-times more ALUs. Even if the new ALUs will be 50% bigger in size, it should be possible to fit ~3000 of them into a ~400mm² GPU.
 
It's not as simple. ALUs (unified core) take less than 1/2 of the die (I'd say it can be closer to 1/3 than 1/2) and I doubt they'll double number of TMUs, ROPs and memory controller as well. Enlarging the non-shader parts by (let say) 50%, you'd still have sufficient space to fit 3-times more ALUs. Even if the new ALUs will be 50% bigger in size, it should be possible to fit ~3000 of them into a ~400mm² GPU.

I recall AMD saying that overall the ALUs aren't much larger than previous generations but I have to agree that there will be additional overhead that they weren't taking into account that will increase the size a bit more than they were hinting.

I was thinking ~320mm2, or at least smaller than 350mm2.
 
given that it appears that AMD will have a very simple scheduler compared to Fermi will this new arch actually grow in number of transistors per functional unit that much?
I think this time around, that could make for one of the main differences in the respective compute densities of GCN and Kepler if Nvidia chooses to stay with their approach and not reduce scheduling overhead significantly themselves.

The GF104-approach seems to lack quite a bite when (diverse) computing (loads) is key compared to gaming performance, see Luxmark for example.

It took AMD something like half a billion transistors to add that second primitive setup block in Cayman, given that Cypress had already dual scan-out pipelines already.
They changed a bit more than just distributed geometry in Cayman i think. 4 additional MCUs including their respective scheduling and texturing units for example. And more capable render back ends.

--
On a more general note: AMD still seems to have scaling issues with their highest end Cayman GPUs, less though than with Cypress, but still. On pure compute stuff like HashCat and the like, I've seen linear scaling from 5770 to 5870, 5850 to 5870 and 6950 to 6970, but Luxmark for example shows slightly different results as do other, more "real world" workloads including games. Nvidias offerings inside the same architectural line, i.e. GF1x0, show almost linear scaling with growing number of SIMDs/clocks - which cannot entirely be attributed to the additional memory channels.

IMHO that's the most important problem to solve for GCN and probably was one of the main goals in its design.
 
Last edited by a moderator:
On a more general note: AMD still seems to have scaling issues with their highest end Cayman GPUs, less though than with Cypress, but still. On pure compute stuff like HashCat and the like, I've seen linear scaling from 5770 to 5870, 5850 to 5870 and 6950 to 6970, but Luxmark for example shows slightly different results as do other, more "real world" workloads including games. Nvidias offerings inside the same architectural line, i.e. GF1x0, show almost linear scaling with growing number of SIMDs/clocks - which cannot entirely be attributed to the additional memory channels.

IMHO that's the most important problem to solve for GCN and probably was one of the main goals in its design.
Hmm, caching as advantage for Fermi?

Or maybe Cayman is a bit over the top of scheduling overhead with so many SIMDs, but on the other hand GT200 was a lot more loaded with running threads and a galore of SIMDs (30) and it did fine, sort of.
 
It's not as simple. ALUs (unified core) take less than 1/2 of the die (I'd say it can be closer to 1/3 than 1/2) and I doubt they'll double number of TMUs, ROPs and memory controller as well. Enlarging the non-shader parts by (let say) 50%, you'd still have sufficient space to fit 3-times more ALUs. Even if the new ALUs will be 50% bigger in size, it should be possible to fit ~3000 of them into a ~400mm² GPU.
This might be true but I don't think AMD is going for a ~400mm^2 die size.
And I think the area dedicated to the the SIMDs was pretty constant percentage-wise since r700. Even assuming it grows to roughly half, I'd think 2048 ALUs (and a die size below 350mm^2) is just more likely than 3072 (and a die size of ~400mm^2).
AMD might not double TMUs, ROPs, MCs (though in case of TMUs I wouldn't be surprised if there will still be 4 per CU), but there's certainly other things they can do which will need die area. Figuring out how to make more efficient use of bandwidth for example.
 
On a more general note: AMD still seems to have scaling issues with their highest end Cayman GPUs, less though than with Cypress, but still. On pure compute stuff like HashCat and the like, I've seen linear scaling from 5770 to 5870, 5850 to 5870 and 6950 to 6970, but Luxmark for example shows slightly different results as do other, more "real world" workloads including games. Nvidias offerings inside the same architectural line, i.e. GF1x0, show almost linear scaling with growing number of SIMDs/clocks - which cannot entirely be attributed to the additional memory channels.
Sounds like you've found something that isn't compute-bound on AMD.

IMHO that's the most important problem to solve for GCN and probably was one of the main goals in its design.
Ignoring the VLIW question, I think Barts shows that these chips are just horribly wasteful in their balance of ALUs versus ROPs.
 
Back
Top