AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Discussion in 'Architecture and Products' started by UniversalTruth, Dec 17, 2010.

  1. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    Even though mining is getting less and less viable I'm sure like hell my two HD6970 will be swapped for two top single GPU GCN cards.
    With them I hope to get close to 1.5 - 2.1 GH/s
     
  2. John021

    Newcomer

    Joined:
    Jan 1, 2010
    Messages:
    29
    Likes Received:
    0
    Heh, my dedicated mining rig has 4 cheap 5830 pulling 300mh each, plus 2 6850 in CF in my personal comp, if it wasn't for bitcoins i would never have bought all these gear xD

    Sorry but what do you mean "GCN cards"? 1.5-2.1 gh/s for 2 cards seems like a LOT, i am thinking this architecture will be more like 6900 series with not so much increase in Gflop pure raw power
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    GCN = "Graphics Core Next"
     
  4. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK

    GCN - Graphics Core Next - next generation GPU's from AMD, which is most if not all GPU's from Southern Islands family.

    One HD 6970 is pulling around 400MH/s so I expect one top of the line S.I. card to pull at least 750MH/s which brings to CF setup being capable of pulling 1.5GH/s.

    This is assuming shaders will at least double again as was the case with previous lithography shrink. 55nm RV770 (800) -> 40nm RV870 (1600).
    Transition from 40nm to 28nm should be bigger than usually because TSMC skipped 32nm which should offset larger area required by GCN shader compared to VLIW4.
    The only reason to expect less than doubling of shaders in new GPU is if AMD wants to decrease die size of top GPU's to better match sweet spot strategy.
     
  5. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    GCN is the disruptive, new architecture so I believe it's the generation after sout
    hern island.

    southern island being a VLIW4 almost the same as a radeon 6970, but at 28nm.
    possibly it's not here intending to be a super fast line up of GPU, but it would be a bit like the GT21x line, with a relation toward the first GCN similar to GT21x vs GF100. (or if you wish what GT21x would have been if it was not so area-wasting and hadn't failed :razz: )
     
  6. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    I seem to remember pretty clear statements from AMD about GCN being released this year.
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I'd certainly expect the die area per shader to get bigger due to the overall increased control logic...
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Yes instead of one sequencer for 10 or 12 SIMDs, which are each 16 work items wide we're looking at one scalar unit (sequencer) for 4 SIMDs, each of which is 16 work items wide.

    On the other hand, the closeness of the scalar unit in the new design should mitigate some of the communication/bus overhead of the older design, since the sequencer was effectively sending an entire wodge of instructions/state to each SIMD to enable it to start a clause. It'll be a trickle instead of the bursts of yore.
     
  9. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    Yes, that's why I said (not too clearly) in my post - skipping 32nm should somewhat offset increase of (single) shader area for GCN vs VLIW4.

    In other words putting 3072 VLIW4 shaders on 32nm hopefully takes similar space on die as 3072 GCN shaders on 28nm. (if it's less then great)
     
  10. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    I'm disappointed that you don't have a really really uber cool name for it. :lol:

    Oh and its awesome to see the greatest islands ever incorporated into your code names, New Zealand! :mrgreen:
     
  11. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Ah ok. I still don't see the numbers quite adding up since 2x28^2 is pretty much 40^2 i.e. I can't see AMD fitting twice as many (more complex) shaders on this chip within the same die size as Cayman. And don't forget that Cayman was supposed to be 32nm, so it's probably really larger than what AMD wanted to do.
    IOW I think chances of twice as many alus are very slim this time around. But really a ~50% increase there can still double effective performance (my personal bet is still 32 CUs, the same alu number I had in mind pre-gcn, an even smaller increase).
     
  12. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Let's not forget that AMD will move from the more "compact" VLIW logic to something that will definitely pack less ALUs per die area. The new architecture will most likely boost at least four setup engines, and we know that parallel geometry processing is not cheap -- it was not in Fermi and certainly won't be for GCN.
     
  13. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    remember they already have 2 setup engines in cayman and it doesn't seem massively bigger then anything previous assuming shaders took the same or a little more transistors(compared to cypress), the mtu's took a few more trannys and so did the memory controller.

    given that it appears that AMD will have a very simple scheduler compared to Fermi will this new arch actually grow in number of transistors per functional unit that much? I dont think we can really use Femri as a basis for comparison of what GCN will be like vs VLIW4.
     
  14. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    It took AMD something like half a billion transistors to add that second primitive setup block in Cayman, given that Cypress had already dual scan-out pipelines already.
     
  15. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    It's not as simple. ALUs (unified core) take less than 1/2 of the die (I'd say it can be closer to 1/3 than 1/2) and I doubt they'll double number of TMUs, ROPs and memory controller as well. Enlarging the non-shader parts by (let say) 50%, you'd still have sufficient space to fit 3-times more ALUs. Even if the new ALUs will be 50% bigger in size, it should be possible to fit ~3000 of them into a ~400mm² GPU.
     
  16. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    I recall AMD saying that overall the ALUs aren't much larger than previous generations but I have to agree that there will be additional overhead that they weren't taking into account that will increase the size a bit more than they were hinting.

    I was thinking ~320mm2, or at least smaller than 350mm2.
     
  17. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I think this time around, that could make for one of the main differences in the respective compute densities of GCN and Kepler if Nvidia chooses to stay with their approach and not reduce scheduling overhead significantly themselves.

    The GF104-approach seems to lack quite a bite when (diverse) computing (loads) is key compared to gaming performance, see Luxmark for example.

    They changed a bit more than just distributed geometry in Cayman i think. 4 additional MCUs including their respective scheduling and texturing units for example. And more capable render back ends.

    --
    On a more general note: AMD still seems to have scaling issues with their highest end Cayman GPUs, less though than with Cypress, but still. On pure compute stuff like HashCat and the like, I've seen linear scaling from 5770 to 5870, 5850 to 5870 and 6950 to 6970, but Luxmark for example shows slightly different results as do other, more "real world" workloads including games. Nvidias offerings inside the same architectural line, i.e. GF1x0, show almost linear scaling with growing number of SIMDs/clocks - which cannot entirely be attributed to the additional memory channels.

    IMHO that's the most important problem to solve for GCN and probably was one of the main goals in its design.
     
    #537 CarstenS, Jul 26, 2011
    Last edited by a moderator: Jul 26, 2011
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Hmm, caching as advantage for Fermi?

    Or maybe Cayman is a bit over the top of scheduling overhead with so many SIMDs, but on the other hand GT200 was a lot more loaded with running threads and a galore of SIMDs (30) and it did fine, sort of.
     
  19. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    This might be true but I don't think AMD is going for a ~400mm^2 die size.
    And I think the area dedicated to the the SIMDs was pretty constant percentage-wise since r700. Even assuming it grows to roughly half, I'd think 2048 ALUs (and a die size below 350mm^2) is just more likely than 3072 (and a die size of ~400mm^2).
    AMD might not double TMUs, ROPs, MCs (though in case of TMUs I wouldn't be surprised if there will still be 4 per CU), but there's certainly other things they can do which will need die area. Figuring out how to make more efficient use of bandwidth for example.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Sounds like you've found something that isn't compute-bound on AMD.

    Ignoring the VLIW question, I think Barts shows that these chips are just horribly wasteful in their balance of ALUs versus ROPs.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...