AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Nice, so that seems to confirm the 181mm² chip is 128-bit, as it matches the first version of the tessellation video that was hastily withdrawn, the video with "EG BROADWAY".

    But the mass production stuff is misleading there, I reckon. If these are the same as the desktop chips, then the chips have to be in production for desktop anyway. And laptops will follow later. The supposed early sighting of RV740 was because AMD was trying to get it into laptops because of their longer design cycles.

    <50% performance gain over RV740? That would confirm 128-bit I think (as opposed to 192-bit or 256-bit), with maybe slightly faster memory. I'm not sure what the fastest laptop RV740 memory is, though. But the clocks appear to top out at 650MHz. 750MHz would be 15%. 800 ALU lanes instead of 640 would be 25%. The two combined is <50%.

    TDP in laptop is confusing: is that the chip or the chip + memory?

    What is M2 and S3? The type of module? "MXM 2" and "soldered 3"?

    Jawed
     
  2. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London

    Or more TMUs ??? :razz:
     
  3. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    :oops: But, well, I don't believe >100% faster for a 181mm² chip compared with RV740.

    So maybe RV710 has been replaced by something very serious (pad limited, what could they do?). Or someone's just sloppy with percentages?

    Jawed
     
  5. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,298
    Likes Received:
    247
    As for the 180mm² GPU - 3 possibilities were mentioned:

    1. 128bit - we know, that 100mm² are sufficient for 128bit bus (GDDR3). GDDR5 interface could require a bit more space, so lets say 120mm² (my guess). We expect, that the rest of space could be used for some interconnection (MCM)

    2. 192bit - die-space of this GPU seems to be sufficient for 192bit bus, but this kind of decision wouldn't be typical for ATi. However, in this case number of ROPs could be increased to 24. As we saw, performance difference between HD4730 and HD4770 is quite significant in some games and the only difference between these two products are ROPs (8 vs. 16). Maybe that performance impact of ROPs was a bit underrated recently. As somebody mentioned, tesselator will produce more edges, which would require more ROPs to keep MSAA performance at acceptable level. The other possibility of course is eliminating ROPs and emulating their functionality via SPs. Anyway, there will be definately no MCM interface in this case.

    3. 256bit - the smallest 256bit GPUs are RV670 (192 mm²) and Parhelia (180 mm²). I'm not sure, if I can consider Parhelia as good example, because it's mem. controller was simple and didn't support GDDR5. I think it wouldn't be possible to cram 256bit GDDR5 controller to 180mm2 GPU... but it's only my opinion.

    Let's assume, that the GPU has no MCM interface and the bus width is 192bit. I have no idea, how complex will be the DX11 implementation, but I'll assume, that it will take as much die-space, as side-port on RV790. I'll also assume, that increased number of ROPs and decreased width of memory bus will compensate each another in terms of die-space. Situation would be quite simple then:

    1 SIMD (SPs + texturing logic) of RV7xx takes about 9.3 mm². 6 additional SIMDs would take 56 mm² on 55 nm. RV790 is 288 mm², additional 56 mm² would make it 344 mm² large. Linear 40nm "shrink" would be 182 mm² large.

    RV790 is 57% faster than RV740 (1680x1050 4xAA/16xAF, ComputerBase.de). 60% of additional texturing and math power + 50% of additional ROPs power could get in the +130% ball-park.

    Someone could disagree because of the bamdwidth. I presume, that this (mainstream) part will be targeted only on MSAA 4x (not 8x or higher). We shouldn't forget, that 512MB HD4770 is capable to deliver 83% performance of 512MB HD4870 with only 45% of its bandwith (again, 1680x1050 4xAA/16xAF, ComputerBase.de). It should be possible to create a product, which would be +120% faster at this bandwidth (enough to get into the +130% ball-park). 1GB of video memory affects performance by 17% (based on the same test, again).

    I think it's possible to create 180mm² 40nm GPU, which would perform at least twice as fast as RV740 (16 SIMDs, 24 ROPs, 192bit bus, 1GB of 2200MHz GDDR5), but I have no idea, how close are these specs to the ATi's next-gen GPU :)
     
  6. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
    Me neither, but thats what they are writing. Maybe its true, maybe a typo or maby just made up.
     
  7. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    922
    Likes Received:
    1
    Location:
    Germany
  8. rjc

    rjc
    Regular

    Joined:
    Oct 27, 2008
    Messages:
    270
    Likes Received:
    0
    I think the original post had "Per Watt" too as well.

    Note support for GDDR5 right down to the lowest chip. Also DDR3 across the range, Don't think GDDR3 is going to be around for too much longer(too expensive, and worse power characteristics too i think).

    Finally re flood of products coming, as said by Nvidia, TSMC is in process upgrading 40nm capacity will be a free for all by about September or so.

    Cause of above can see AMD is coming with lower volume products first while limited capacity is in effect (ie High End and Performance) that don't take too many wafers. After say September/October when capacity comes online can start their mainstream and entry level products that take lots of wafers.

    Is quite interesting that like R7xx series launch AMD must have figured methods make their design process more modular so can quickly migrate their designs across to different market segments.
     
    #1248 rjc, Jul 27, 2009
    Last edited by a moderator: Jul 27, 2009
  9. Pressure

    Veteran Regular

    Joined:
    Mar 30, 2004
    Messages:
    1,341
    Likes Received:
    272
    Since the market is heading mobile, I could actually see the reason why they would want to ship their mobile offerings first.

    Last year were the first year notebooks sold more than desktops, so if they have a large number of design wins it only makes sense.
     
  10. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,948
    Likes Received:
    497
    That would save some die space, GDDR5 only memory controllers have to be more simple than one that does GDDR3 (& other formats) too?
     
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    It's not exclusively GDDR5.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I have made a small area breakdown spreadsheet for RV7xx GPUs, based on a high resolution die photo:

    http://www.cupidity.f9.co.uk/RV7xxAreaBreakdown.xls

    Note I have guessed which are the 4 MCs, the patches I've labelled "C":

    [​IMG]


    I can't quite identify a fourth "B", which is a shame as A and B seem to go together as a pair :sad: I'm assuming that there is a fourth B, so have counted the area of A+B as RBEs (including colour/z/stencil buffer caches) + L2s.

    The naive scenario I have included in there is called "Juniper is RV740 with extra clusters and no D3D11-specific changes". That scenario is 16 clusters, 128-bit, no sideport - it has room to spare. Obviously we know there are D3D11-specific changes (and others):
    • enhanced tessellator
    • HS and DS slots required in scheduler
    • LDS is 32KB
    • texture filtering is precisely defined (could be expensive? return to big TUs?)
    • 16KB burst fetch mode
    RV740 also has room to spare. Refinements to the spreadsheet welcome :grin:

    Jawed
     
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    You're probably right. GDDR3 has worse power characteristics than DDR3 as far as I can tell, and ddr3 is almost certainly always cheaper. In fact it seems now ddr3 512mbit parts are even available for graphics cards (without that you couldn't build 128-bit 512MB parts, since ddr3 chips only come in 16-bit wide flavors max, though I'm not sure this is even desired any longer), with frequencies up to 1.0Ghz. That's still slower than gddr3 (up to 1.3Ghz now), but the difference isn't that huge and if you require more memory bandwidth it might be more effective now to use narrower bus with gddr5 memory. Which, btw, now seem to be available at 1.5Ghz/6Gbps (not that this speed grade would be something for low-end chips, but that's more than twice the bandwidth per pin compared to the fastest gddr3 parts available).
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Maybe it's possible to turn the I/O areas through 90 degrees so that they take less perimeter?...

    My dodgy spreadsheet indicates a 10-cluster, 192-bit, 24 RBE Juniper would leave ~20mm² for D3D11 improvements :smile:

    If that was 32 RBEs then my spreadsheet indicates no, no space left for anything major for D3D11.

    One of the problems with my spreadsheet is not knowing if there's a cap ring. It also assumes that RV740 has no sideport.

    Yes, RV740 is deceptively good, and a high RBE:bandwidth ratio is better than I dare hoped for :grin: I just hope the 256-bit Evergreen GPU has 32 RBEs.

    Jawed
     
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,186
    Likes Received:
    1,841
    Location:
    Finland

    This is something that I can't really understand, do we have any real reason to presume that it uses similar architecture to RV7xx?
    I mean both overall architecture and unit-wise.

    For example, look at RV6xx and RV7xx, did anyone expect them to ditch ringbus so quickly, or believe how much smaller they could make the shader units?

    There could be several similar big changes in Evergreen-generation, or even bigger
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Charlie v Theo (actually, everyone v Theo)

    http://www.semiaccurate.com/2009/07/27/plagiarism-rampant-it-journalism/

    Forget that, what about this:

     
  17. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    Lol, I heard that about HemRock too today, but not from Charlie. .it was named as a Hypothetical HD5900 card...
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Sure. Feel free to speculate in that direction!

    e.g. Texture filtering in D3D11 is strictly defined, the reference rasteriser is meant to be absolute in this I believe.

    Though R600 doesn't show "highest possible quality" texture-filtering, I've been wondering if R600's fundamental architecture for texture filtering, which works in the fp16 domain, was ATI's step towards strictly defined texture filtering. Perhaps R600 can do D3D11-strict texture filtering (hmm, doubtful, I know).

    If Evergreen is the return of fp16 TUs, then that's a lot of die space... RV770's conversion to old style TUs supposedly increased TU performance by 70% per mm².

    Jawed
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    Sorry if this is obvious but why would DX11's stricter requirements mandate full-speed FP16? DirectX guidelines are all about quality not performance right?
     
  20. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,490
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    R600 had additional point samplers in there, not just the wider bilerp lanes. As for the possible native FP16 tex impl in R800, it would be justified, if game dev's are to be lured by the new FP compression formats and start using more extensively HDR texturing all around.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...