AMD: R9xx Speculation

Discussion in 'Architecture and Products' started by Lukfi, Oct 5, 2009.

  1. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    504
    Likes Received:
    187
    I've been wrong before and I'll certainly be wrong again, thanks for weighing in.

    Just so I understand your perspective: lots of people on these fora, not to mention Charlie at semiaccurate, are constantly prophesying doom and gloom for Nvidia because of their large die sizes. If one believes AMD and Nvidia can make good margins on GTX580 and HD5970, it would suggest that fixating on die size is actually not very useful - that in other words the discussion about die size has been much ado about nothing. Am I understanding your perspective correctly?
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,484
    Likes Received:
    1,844
    Location:
    London
    Which is the case for the entire range, not just for the halo chip.
     
  3. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
  4. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    Well, these 16 additional ROPs in Bart, aint there purely for marketing reasons. As it was mentioned, it was decision between whether they'd put 2 additional SIMDs or one additional ROP cluster in Barts. And it was noticed that 2 SIMDs would have better utilization for ~2% than 16ROPs cluster.
    So 32ROPs decision more probably seems like time-saving-design decision to avoid to undertake time consuming redesign to adapt Cypress-styled GPU into properly working 16ROP design. It even sound contradictory to me how they compared 1280/16 vs 1120/32 design, but then that was in the first place, befor actual design start, decision justified on time-savings and how much additional work should be done.


    Why is so rejected as unconceivable theory that Cayman could have 48ROPs? Is it so hard to nest it inside originally redesigned chip?
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,484
    Likes Received:
    1,844
    Location:
    London
    What's wrong with:

    Code:
    A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3
    A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4
    A1 A3 A1 A3 B1 B3 B1 B3 A1 A3 A1 A3
    A2 A4 A2 A4 B2 B4 B2 B4 A2 A4 A2 A4
    
    Each rasteriser tile is 8x8 pixels in this example.
     
  6. PSU-failure

    Newcomer

    Joined:
    May 3, 2007
    Messages:
    249
    Likes Received:
    0
    Exact meaning was performance was even, +/- 2%.

    In fact, Barts is probably a 16 SIMDs part with 2 disabled even on the XT variant, as it wouldn't be that better...

    1- probably no more than 10% more performance clock/clock
    2- higher power draw (more active transistors)
    3- hey, boss! Why not disable some SIMDs entirely, so that yield end up closer to 100% than ever before? (nVidia redundancy approach, more or less, and perhaps disabled SIMDs could draw almost no current)
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    Yep. I really wonder though what frequency Cayman XT will actually achieve. It seems plausible that it won't quite be 1.5Ghz (which incidentally would be needed to have the same bandwidth as GTX580), but something like 1.4Ghz would be less than 20% increase over HD5870. At least this time they probably won't use the Redwood PHY :).

    That's not how I read it. Adding 2 additional simds just was about 2% slower than adding the 16 additional rops. So using some educated guess (taking simd scaling into account etc.) this looks to me like the 16 additional rops are good for a bit more than 10% increase in performance. That's not a whole lot, but the ROP/bandwidth ratio is a bit higher than on HD5870 (higher core clock/lower mem clock), and it should only add about 12mm² or so to the die size, so this was definitely worth it.
    You mean just 3 rop blocks per memory partition instead of 2? Well I guess that would be possible in theory (but none of the rumors suggested that). But you also have to keep in mind that memory bandwidth hasn't gone up a whole lot (max possible is 25% against HD5870 if they really would achieve 1.5Ghz, more realistic is probably something like 20%), so the bandwidth might just not be there for any additional rops to really help much (though I guess more z fillrate could help a bit - I don't think more color fillrate would help in any case).
     
  8. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Hey guys im having a dumb here, what does it mean when it says 'radial' and 'blower'. What are the practical differences between the two? Is one better? Cheaper?

    Anyway wheres the 2GB SKU? How come some official looking slides say 1GB and others say 2GB?
     
  9. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Small triangles (which are problematic for both, the rasterizers as well as the ROPs) tend to cluster on the screen ;). It's therefore preferable to use smaller checkerboard patterns for a more even distribution of the load.
    Your example would group 4x4 tiles of 8x8 pixels (at least, maybe you were thinking of stripes) for one rasterizer. The probability that one rasterizer is completely busy and the second one has nothing to do is much higher in that case (or you would need much larger buffers). Neighbouring tiles definitely should be assigned to different rasterizers. But one may make the ROP tiles larger than that. It depends a bit on the buffers/color caches there. But generally the same rule should apply.
     
  10. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,018
    Likes Received:
    3,837
    This is my understanding. You have X amount of space on a wafer. With a 550mm2 chip you can make Y amount of chips. With a 330mm2 chip you can make Z amount of chips. So you can already make more of Z than of Y. To compound this Y will have alot more problem chips than Z.

    So Z will cost less than Y.

    The benfit of Amd's stratagy is this.

    You have competetive cards made with a single Z . So in this example you get the 5870 with perfect dies. Cores with few problems become the 5850 and those with alot of problems become 5830. You then get the benfit of using two of the perfect cores to create you halo product a product that if made of a single huge core wouldn't be priced well or be made in enough quanity.

    You then have what happened to Nvida. They made a single huge chip that had alot of problems. So with nvidia they had to start with problem chips. So you had the gtx 480 which was a problem chip. You have the gtx 470 which was even more crippled and then you had the gtx 465 (i think it was) that had even more problems. Yes the gtx 480 was faster than the 5870. however it cost more to make as nvidia got fewer of those than amd got of the 5870 and it wasn't fast enough to compete with the 5970 .

    And don't forget that you have what the wafer costs. So if a wafer costs $5,000 and amd gets 500 chips each chip costs $10 bucks. If nvidia only gets 100 chips then its $50 a chip. So nvidia's product could end up costing $40 more right off the bat. Throw in higher power usage that the gf100 had and you need better cooling and better components to supply the power and you also need more layers on your pcb which all continues to add to the cost.

    now i have no idea how much a wafer costs or what the yields are for each chip. I was just using a basic example

    Look at whats happening now. We are finally getting the gtx 580 which is what the 480 should have been while amd has been able to introduce barts which offers what 90% of the performance with less than 80% of the die space ? This is a great deal for amd as they are able to enter lower markets with a smaller chip . The chip is smaller than cypress which is smaller than the geforce 104(gtx 460) Cayman is coming which is going to be bigger than cypress but should still be smaller than the gtx 580 and will offer perhaps similar performance , but since its smaller it seems amd can slap two on a board and be even faster than the gtx 580.

    I really don't care. I wont spend more than $200/250 on a video card ever again. I much rather just wait a year and get the new performance card.
     
  11. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    875
    Likes Received:
    205
    Location:
    'Zona
    Good point. I definitely know that is a part of the reason they don't go right up to the limit but I wasn't taking that into consideration at the time.

    I don't think I, nor many people here, think that way though there are definitely some. I am more of the thought process that if they keep pushing up towards the limit they have some tough times ahead, as seen by G200 and GF100. Obviously, with Nvidia making most of their money in the workstation/professional market and the GPGPU stuff slowly increasing, they are able to live off those large margins but as far as the desktop market, performance and enthuisast is not where the big money is and you need those OEM contracts.

    Because since the ROPs are tied to the MC and most of us seem to not believe they are going with a 384 bus with +240Gbps of bandwidth.

    Well radial = blower, i.e. the reference fan/cooler design on most of their highend series, HD58x0, HD48x0, etc.

    I would assume the axial fans are the ones commonly found in the non-reference models and are most likely cheaper and seem to offer better cooling performance for lower sound levels but also are not commonly designed to exhaust air out of the back of the case.
     
  12. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    18,365
    Likes Received:
    8,793
    Die size does matter, but you making some seriously flawed assumptions with your use of die size in comparing the impact on something the size of Cypress versus GF110 (5970 versus GTX 580).

    This example I'm using obviously isn't going to be with regards to actual dize size or die size ratios between Cypress and GF110, but...

    Lets say you have one die that is 2x the size of another die. And assume that there's no redundancy mechanism for dealing with potential defects.

    So only 1 defect is enough to ruin a die. So lets say there's just one defect in the area taking up by the large die. And lets say 2x of the smaller die take up the same wafer area. Only now, only 1 of those smaller dies is defective and the other one is perfectly fine. So 1x failed big die = 1x failed small die and NOT 2x failed die as you are assuming. Already you're getting better yields with the smaller die. This is a very simplistic case, obviously but it's only to prove a point.

    As such potential yields for the larger die are going to be far worse than the yields for a smaller die. Hence you'll get more product as a percentage of total wafer size the smaller your dies are. Which is why there's an exponentially higher percentage of dies with defects per wafer the larger your die size is.

    Thus on the same wafer with the same number of defects scattered across the wafer, you will almost always get more than 2x the number of smaller dies (that are 1/2 the size of the larger die) versus larger dies.

    So in this case. With regards to margins, yields, cost of manufacture, etc. 1x GF100 doesn't equal 2x Cyrpess since on the same wafer in theory you'll get more than 2x good Cypress cores for every 1 GF100 core. The ratio can be adjusted up or down depending on how well you've designed the chip to be tolerant of potential defects.

    Regards,
    SB
     
  13. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
  14. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    I just want to add to this that in addition to the physical costs of making each die/board you also have the NRE (Non recoverable expenditure) from the actual research and development costs in making the chip. Now with a sweet spot strategy instead of having one large SKU which might sell 5M chips and one smaller SKU which might sell 15M chips for instance, say GF100 and 104 you instead have one chip which allows you to amortize the NRE over say 20M chips for both tiers of product.
     
  15. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    There are currently 2 rops per memory partition. I think 3 rops instead should be possible, but I'm not sure it would help a lot.

    That's all true, though HD5970 isn't really that die area efficient indeed. If you look at HD6870CF results, you'll notice it's easily quite a bit faster in almost everything with only very slightly higher power draw. AMD could IMHO quite easily do a 2xHD6870 solution instead (might require chip binning or VERY slightly lower clocks along with slightly reduced voltage) which would cut "combined die size" quite a bit (to GF110 levels actually), and still have the same (or faster) performance with the same power draw as the HD5970. The only drawback I'd see with such a solution would be that it wouldn't overclock that well... But of course such a product would be pointless with the imminent Cayman release (even if Antilles is released a bit later). FWIW I don't think Antilles will be very efficient using that "perf/combined die area" measure neither. Seems a safe bet AMD will need to reduce clocks (and maybe even disable some parts), possibly to a larger degree than what was necessary for HD5970. But of course the point of this solution won't be that it's efficient. The one and only goal of this product is that it's the fastest graphic card.
     
  16. ECH

    ECH
    Regular

    Joined:
    May 24, 2007
    Messages:
    692
    Likes Received:
    30
    New rumor has it that the 6950 and 6970 will be released on Novemeber 29th
    source
     
  17. RobertR1

    RobertR1 Pro
    Legend

    Joined:
    Nov 2, 2005
    Messages:
    5,841
    Likes Received:
    1,276
    I thought they'd try to get them out before black friday, esp if the date is that close to BF.
     
  18. gkar1

    Regular

    Joined:
    Jul 20, 2002
    Messages:
    614
    Likes Received:
    7
    The 29th is Cyber Monday :)
     
  19. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    I used to embrace that 1280SPs theory also in pre-release Barts times.
    But then AMD doesn't need to design redundancy in their rather small chips (255mm²) after all as nv does need with close to 400mm² GF104 chip and their older and bigger bros. And G80-GF100 design approach are fairly different than RV670->Barts ("RV940") design approach. So, now i could settle with AMD simulated both versions 1280/16, 1120/32, and even 1280/32 which was rejected with ultimate goal for smaller chip. And 32ROPs needed least design optimizations over 1280/16 part, and design outperformed main competitor GTX460 even with 1120SPs. (ofc, we excluding to mention here insanely good overclock abilities of GF104 chip (25%+ ) which then arose as 240W TDP parts instead 160W, and Barts couldnt be clocked that well on air even if they theoretically could sustain that high TDP :lol:)
    I'd back-upped that to Jawed's notes about strong ROPs ties to Setup engines in HD5000 series (and all R600 onwards design), and assumptions that we'd even saw same thing in "redesigned 4-VLIW engine" Caymans.


    In fact, that might not be so hard to believe considering how gratuitous nVidia payoff tsmc to push on faster transition to 28nm node which they desperately needed for their GPUs. :lol: And considering that AMD should and would go to GloFo 28nm node for most of their value products (NG Bobcat APUs -Krishna/Wichita, and for most of their mobile products following these mobile "2nd Gen dx11 GPUs") and that's the most profitable product spot :eek:

    Well, it ain't totally false :) if we could judge by this
    [Apr2010] Virage Logic launches 28nm IP suite at TSMC event
    And capacities should certainly be good enough for GF119 part (96SPs GF104 alike, 64-bit). With probable die size around 70mm² considering its 40nm GF108 counterpart is 116mm².
     
  20. ZerazaX

    Regular

    Joined:
    Oct 29, 2007
    Messages:
    280
    Likes Received:
    0
    They said their sources were from the internet? Let me guess... Fudzilla? :lol:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...