AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Discussion in 'Architecture and Products' started by UniversalTruth, Dec 17, 2010.

  1. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    Pretty much I'm thinking they will skip the original specs that were to be on 32nm and go straight into the "refresh" specs on 28nm(For reference, NI being the gimped version on 40nm).

    "NI Cayman XT" = 24 SIMD/40nm
    "Original Plan" = ?28 SIMD/32nm?
    "SI Cayman XT" = ?32 SIMD/28nm?

    If SI is as related as to NI as I think, it should not take anywhere close to a year for the HD 7 series to be popping their heads. Summer release? Hit before back to school?
     
  2. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22

    No. :( No. :( :???:
     
  3. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    AMD desperately needs to address some architecture bottlenecks, otherwise a increase from 24 simds to 32 simds (at the same clock) would give a 10% perf increase at best. So IMHO SI can't really be almost the same as NI. For that matter, I doubt the original plan actually had more simds, unless there were other changes (which seems unlikely given that Dave said the design was just back-ported to 40nm).
     
  4. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    Where does this rumour about 32 SIMDs @28 nm originate from? Because it is so conservative and on the low side of expected specifications. That possible chip would be both very small, and weak against what NVidia will offer. :shock:
     
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I think it's rather the opposite, everybody assuming amd will go for somewhat small chip again (as they did for rv770 and initially Cayman, only Cypress got "too big"), then figure simd count can't be that much higher - if the chip is really only about 270mm² 32 simds sounds plausible, if there are other changes too. And fwiw, even with "only" 32 simds that be would more than twice the texture and peak alu rate of GTX 580, so you can't conclude from that "small" number of simds automatically the chip is slow - if other bottlenecks are addressed it could potentially be twice as fast as GTX 580 (ok not going to happen but you get the point).
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I would expect something like that as it would consume about as much area as rv770 or barts,
     
  7. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    Ok, let's assume that the new chip will be small enough. But where did the doubling of SPs go? 320 to 800 (that's even 2.5 times), 800 to 1600, etc. Only Cayman seems quite beefy and containing stuff which should be removed, like Fermi 1.0. So, my bet is at least 2500 SPs, 32 simds is only 2048 SPs, if my calculations are right. Next generation Nvidia GPU can very easily be a 1024 SPs' part. :(
     
  8. TKK

    TKK
    Newcomer

    Joined:
    Jan 12, 2010
    Messages:
    148
    Likes Received:
    0
    Doubling was easy for RV770 because R600/RV670 actually spent a lot of their transistor budget on stuff like the ring bus, and their SIMDs weren't that size-optimized either.
    Cypress could double SP count because it was both on a smaller process node and had more die area at its disposal.

    And like mczak already said, it's pointless to double SPs unless some bottlenecks that are obviously still in place are removed first.

    For example, it's possible that 4 schedulers feeding 8 SIMDs each would be far more efficient than the current 2x12 design, in a best-case scenario the increased efficiency would result in a real-world scaling better than the theoretical 33% increase in shader/TMU power.

    Would you mind telling us what "beefy stuff" exactly is superflous in your opinion and should be removed?

    afaik, most of the additional transistors were spent on
    - upgraded ROPs = better AA performance
    - more TMUs [more SIMDs] = better AF performance
    - doubled 'graphics engine' = better tesselation performance

    As you can see, the problem is just that most of the additional transistors were spent on things that increase performance only in specific scenarios.

    Nevertheless, I think those changes were necessary to make Cayman more future-proof than Cypress.


    And it would likely be at least twice as large as a 32 SIMD chip from AMD would.
     
  9. turtle

    Regular

    Joined:
    Aug 20, 2005
    Messages:
    279
    Likes Received:
    8
    I pretty much agree, yet vehemently disagree.

    I agree Cayman would be pretty well maxed out at 1792sp (<10% more perf from SIMDs), but that is more than 24. Anything above that appears pointless as far as I can tell...and I've come at it from A LOT of different angles. I don't doubt the possibility I'm overlooking something, but at this moment it looks like the buck stops there for absolute efficiency. I hardly believe that makes the architecture broken though, or non-scalable.

    That said, I think 6970 is a woefully-damaged product compared to what may have been (on 32nm or not). I have little doubt at 900/6000, a quad more SIMDs would have put 'RV970' eye-to-eye with GTX580. GTX580 seems clocked as if they were still expecting that product, it's that close (according to my math anyway).

    Going by TPU's latest graph at 1920x1200:

    GTX580 avg 1.11363636_ faster than 6970 (+-1%).

    580 has 9.2% greater BW.
    9.2%*16% = 1.472% performance difference from BW.
    Difference: 1.09891636363636_

    On Cayman, 1% SIMD theoretical increase = .5% RW performance increase on avg. (Judging by overclocking results & 6970/6950 difference adjusting for BW)

    2703660*1.1978327272727_ = 3238532 Flops

    3238532/(1792*2) = 903.6mhz

    903.6/900 = .4%/2 = .2%

    SRSLY.

    Now, is that PERFECTLY EXACT & REFLECTIVE FOR ALL SCENARIOS? No. Is there possibly a bottleneck between 1536-1792 (and probably 1664-1792)? Sure. I think it's pretty close though, as-in within a couple/few percent.
    _____________________________________________________________________

    I think the next stop is 48 ROPs. 40-42 SIMDs. 40 makes the most sense for lots of reasons, but 42 meshes nice with 1792/32, plus it's the answer to everything in the universe, why not a future GPU SIMD count. At this point my opinion (which seems to change randomly) is that I would be surprised if neither is the case. If '32nm Cayman' was going to replace Cypress at a similar size, something has to do it in-turn on 28nm. As such, something was always going to replace Barts, and that previous 32nm part was probably going to be shrank to 28nm. A Cypress size die, going from 32->28nm would fill that spot nicely.

    I hope the 28nm stack has both those discrete chips TBH, I think they'd be shoe-in replacements for Cypress and Barts. I'd be happy with half-specs as Juniper and Redwood replacements too. You'll notice they would oddly pair up with past parts (896 VLIW4 = 1120 VLIW5, 1280 VLIW4 = 1600 VLIW5.) Strange, that...only not really at all.
     
  10. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    You are way overestimating simd scaling. On Cayman (from Pro to XT), 9% more simds is good for only 2% perf increase (results taken from here, http://ht4u.net/reviews/2010/amd_radeon_hd_6970_6950_cayman_test/index43.php HD6970 at HD6950 clocks). Ok there are some inaccuracies here, but still scaling to even more simds is only going to make things worse. 4 more simds? 5% increase at best.
    FWIW, Barts showed much better scaling - 7% faster for 17% more simds (14 vs. 12 simds - http://ht4u.net/reviews/2010/amd_radeon_hd_6850_hd_6870_test/index34.php). Given the same number of rops, not that much less memory bandwidth etc. though it is certainly expected.
    But the point is, Cayman is past the point where just increasing amount of simds is economical - that is an increase in simds has a larger increase in terms of die size than performance. Now, it is possible fixing other bottlenecks also would increase die size more than performance, but at this point simd scaling is so minimal it seems amd really will have to find something else to improve performance. I'm not sure increasing ROPs is going to help a lot neither, since AMD apparently was looking at a 16 ROPs Barts (with 2 more simds instead) with very similar performance - so it doesn't look like 32 ROPs are very limiting for Cayman (though 48 ROPs are probably a viable option - just 3 quad-ROP partitions instead of 2 per 64bit channel).
    I'm still thinking 32 simds (4x8) would be enough for a next gen chip - that's twice the "simd performance" of Barts, after all. It just needs to scale better with simd count than Cayman (or Cypress...).
     
    #50 mczak, Jan 16, 2011
    Last edited by a moderator: Jan 16, 2011
  11. UniversalTruth

    Veteran

    Joined:
    Sep 5, 2010
    Messages:
    1,747
    Likes Received:
    22
    :mrgreen: Ok, if that's the problem, the scaling in such a case, simply add.... :mrgreen: .... uhhhhhhh....grrrrrrrrr............. :mrgreen:............ more cards. It's so simple. :mrgreen: CF scaling looks like a very good option. :mrgreen:
     
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    I think more analysis than a few rounded, averaged benchmarks need to be done to really conclude anything; I'd say that probably one of the titles in the list there would probably demonstrate much in the way of any perfrmance improvement from additional SIMD's.
     
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Oh you're right more detailed analysis would be nice. In particular the average not only includes quite a few titles, but also quite low resolutions (with and without AA/AF). It is entirely possible in some titles and at the higher resolutions scaling was better.
    Still, that the average is only 2% more (where at the same time for 10% more clock for the same mix 7% of additional performance is gained) is quite telling imho.
    And unfortunately you didn't give us a die shot of Cayman so we don't know the die size of a simd :). So those 9% more simds might only really cost 4% die area or so in which case scaling wouldn't really look that bad (but still not quite good).
    In any case, if SI really turns out to be mostly a shrink with just more simds, color me surprised.
     
  14. turtle

    Regular

    Joined:
    Aug 20, 2005
    Messages:
    279
    Likes Received:
    8
    Question for you, sir.

    Good or bad way to look at things (on average):

    After special function, this arch should be shooting for 3.25-3.4 shaders per rop per cluster. That, to me, seems the average shader usage on both architectures after SF.

    IE, if we figure doing special function at the rate of GF100, the leftover rate is 3.5/4. If we figure GF10x, it's 3.33__. Obviously that's a comparative generalization, and usage varies, but they're constants and the competition, so it's worth using them.

    (4+4+4+2/4= 3.5 and 4+4+2/3 = 3.33__)

    Therefore,

    32 ROPs:
    3.5 = 1792 (3.5+3.5/2 = 3.5, 3.33_+3.5 = 3.416) --- Excellent...If not over-equipped.
    3.25 = 1664 (3.5 + 3.25/2 = 3.375, 3.33_ + 3.25/2 = 3.29166_ )...Optimal design.
    3 = 1536 (3.5 + 3/2 = 3.25, 3+ 3.33_/2 = 3.16) - What we call 6970.
    2.75 = 1408 (3.5 + 2.75 = 3.125, 3.33_+2.75 = 3.04) - What we call 6950.

    48 ROPs
    3.5 = 2688...same as 1792 with 32 ROPs.
    3.33_ = 2560 (3.33_+3.5/2 = 3.4166_, 3.33_+3.33_/2 = 3.33_)...Optimal design.
    3.166_ = 2432 (3.16_+3.5/2 = 3.33__, 3.16+3.33_/2 = 3.25)
    3 = 2304...same as 6970

    Terrible generalization, or barking up the right tree?

    (Edit: Wait, did I ask this before? If I did, I apologize...I have a terrible memory...and don't follow threads and keep up on things as well as I should.)
     
    #54 turtle, Jan 18, 2011
    Last edited by a moderator: Jan 18, 2011
  15. turtle

    Regular

    Joined:
    Aug 20, 2005
    Messages:
    279
    Likes Received:
    8
    Yes, I would implore you to look at TPU, for example, that show the array of resolutions and the differences between them, while also being a much wider array of games (including pretty much everything from your link) that you can look at individually. Cayman actually (comparably) performs pretty badly at low-rez. That said, it shouldn't really have any massive advantages over 1.536MB at 1920x1200 because of memory size...and it doesn't when you compare it to 580 at 1680x1050. Yet, it should accurately scale at that resolution...Which it does (70 is 11.39% avg faster than 50, +-1%). That's why I picked it...well, that and it's the most common rez for it's market. Likewise, I don't compare integrated gfx at 2560x1600. Not trying to pick on ya though. I love TPU's reviews for all that info...They must take W1z freaking forever...but I love him for it.

    As for being surprised, I guess I think you will be. That said, there might be a 1664sp product (if not 1792) on 28nm...and really....would that be more SIMDs than Cayman has on die? The world may never know. Considering that whole "6950s became 6970s...blahblahblah bios chage...ZOMG YIELDS" snafu, I wouldn't take that bet.
     
    #55 turtle, Jan 18, 2011
    Last edited by a moderator: Jan 18, 2011
  16. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    The problem is, there's not HD6970 at HD6950 clocks there. I'm willing to believe though at higher resolutions there is more of a performance difference for these 2 more simds, but how much more there's not enough data to even make a guess.

    I'd be surprised if I'll be surprised :) We'll see ;-).
     
  17. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,311
    There's been benchmarks done by 6950 SIMD unlockers and the increase from only unlocking the additional SIMDs varies greatly by application and benchmark. Some applications see very little increases in performance while others see quite a bit more.

    So any potential increase or average increase is going to be determined quite a bit by the applications chosen by a person benching his card or a review site benching the card.

    Regards,
    SB
     
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
  19. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    So, no new architecutre until 28nm?

    That would be a really boring year now.
     
  20. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83

    Looks like a PR mailout, so I'll believe it when I see it. Given the poor past performance of just about everyone except Intel, every prediction of when a new process will be making chips should be taken with a large pinch of salt.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...