AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    Oh dear, that doesn't sound right... someone took the 1600 SP's and connected that to 1.2Ghz GDDR5. what are we to do now?
     
  2. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
    Im already enjoing it! Like a cold shower ;)
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,484
    Likes Received:
    231
    Location:
    msk.ru/spb.ru
    1200/20 = 60, 60 < 80
    1280/20 = 64, 64/5=12,8 -)
    The way i see it 20 SIMDs are only possible with 1600+ SPs.
    For 1200 or 1280 SPs the number of SIMDs will be less than 20, probably in 10-16 range.
     
  4. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    RV770 had 800SP in 10 SIMD units.
     
  5. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,484
    Likes Received:
    231
    Location:
    msk.ru/spb.ru
    So?
    The number of SPs per SIMD probably won't be less than in RV770.
    So 20 SIMDs for RV870 will give us 1600+ SPs. And if we're lucky with 120 SPs per SIMD (+50% increase to RV770) we'll get 2400 SPs.
    But that would mean that Juniper has 1200 of them, nearly twice of RV740, wouldn't it?
     
  6. sc3252

    Newcomer

    Joined:
    Jun 6, 2008
    Messages:
    35
    Likes Received:
    0
    Only thing I am worried about is the memory bandwidth... wow they really went cheap this time, only 19GBs, of course I bet that is just a typo.

    If this spec is true, would having 150GBs~ really hinder performance? Since the 4850 has a little more than half of the 4870 and yet still performance fairly well I could imagine this having very little performance impact. I mean really what is everyone expecting? Since pairing it with anything faster would cost a pretty penny, something AMD seems to be against now.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    I've mused before that the TUs might return to R600-style single-cycle fp16 processing. I'm now wondering if Evergreen unifies TUs and RBEs.

    One of the key points of D3D11 is that resources are written/read more fluidly than ever before. Additionally, some of the addressing math for textures and render targets is the same as well as some of the blending. A lot of the fluidity comes from compute shader, but pixel shading provides for writing/reading resources.

    So, what if each cluster now contains a combined TU/RBE?

    In typical game situations the two are not normally both running flat-out - only some dodgy synthetics from yesteryear work that way. The functional overlap isn't really the biggest deal - to me it's more important the way that memory is more of a write/read resource in D3D11, whereas prior versions kept writing and reading as separate passes.

    Jawed
     
  8. Thowllly

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    551
    Likes Received:
    4
    Location:
    Norway
    Only 19GB/s of BW (32bit bus?), as sc3252 pointed out, but on the other hand it has 128GB of ram! :razz:
     
  9. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Wait, is this legit?
     
  10. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    no.. this is legit:

    Code:
    Cypress ~P16xxx - P17xxx - P18xxx
    Juniper XT ~P95xx
    Redwood ~P46xx
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,933
    Likes Received:
    2,263
    Location:
    Germany
    Well, it'd better be, else no DX11, right? ;)
     
  12. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    922
    Likes Received:
    1
    Location:
    Germany
    What about MIMD?

    According to Ailuros, one of the advantages of the SGX compared to other GPUs is the MIMD core because the performance is higher, especially with small triangles. With the "new" tesselation in DX11 maybe AMD thought that they need MIMD to have a higher performance with small triangles.
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,442
    Likes Received:
    181
    Location:
    Chania
    It would be too much trouble to explain to the world that the so far "superscalar" units aren't as superscalar as the new ones ;)

    ***edit: on a more serious note I'm not so sure AMD/NVIDIA really need MIMD units for the time being; it seems to me that changes like that might come in the more distant future if they haven't come up with an even more efficienct idea in the meantime.
     
  14. rjc

    rjc
    Regular

    Joined:
    Oct 27, 2008
    Messages:
    270
    Likes Received:
    0
    Comparing to 4670 at launch, vr-zone got round ~P35XX(Probably scores higher now but cant find a recent bench). There is a chance the above Redwood is configured with gddr5 also.

    Edit:
    Looking at mobile GPUs power profile posted back further in this thread Redwood = Madison:

    Madison HD5750M 20-30W
    Madison HD5730M 20-25W
    Madison HD5650M 15-20W

    Compare to:
    RV730 HD4670M 28-30W
    RV730 HD4650M 12-25W

    That looks to be potentially an ~30% performance improvement in the same power profile.
     
    #1734 rjc, Aug 18, 2009
    Last edited by a moderator: Aug 18, 2009
  15. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    734
    Likes Received:
    225
    Location:
    india
    hudd mama hudd.:lol:
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,578
    Likes Received:
    622
    Location:
    New York
    Cypress ~HD 4870 X2
    Juniper ~HD 4870
    Redwood ~HD 3870

    According to the ORB. Toss in the rumoured Hemlock and Trillian parts and.....
     
  17. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    #1737 neliz, Aug 18, 2009
    Last edited by a moderator: Aug 18, 2009
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So they indeed managed to double the Perf/$. Assuming cypress sells at $300, of course. Cool.
     
  19. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Hello Jawed. If they went this way, would be interesting to see how they handle general RT read/write access (UAVs).

    Typically one would assume that RTs are divided into tiles and those tiles are distributed across the RBEs (ROPs) with a fixed mapping per RT. So raster stage just sends out fragments to the ALUs associated with the RBEs for the destination tile for the fragments. Both TU/RBE local to the cluster. Seems like the set ALU/RBE tile mapping needs to stay just to insure draw ordering.

    Given a unified TU/RBE, random RT access implies read/write from non-local RBEs (or local fetch and some crazy "tile" cache coherency, which I'm guessing is highly unlikely).

    Almost seems like a good idea to just distribute both RBE and TU access (distribute tiles) across the chip similar to how global access is distributed to MCs ... except for various problems like TU filtering requires neighboring texels! So TU access probably stays local with read only cache.

    How do you see a unified TU/RBE working?
     
  20. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,793
    Likes Received:
    1,076
    Location:
    Guess...
    I may be completely wrong, but I thought Juniper was the highest end single chip solution and Cypress was going to be a dual chip soluition?

    If thats the case then what makes this better than the previous generation?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...