AMD: R9xx Speculation

Discussion in 'Architecture and Products' started by Lukfi, Oct 5, 2009.

  1. Mianca

    Regular

    Joined:
    Aug 7, 2010
    Messages:
    333
    Likes Received:
    19
    HD 6970 being 50% faster than HD 6870 isn't the best case scenario ... especially not in 3DMark 11 :wink:
     
  2. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Ok, we're all talking about best case scenarios here. What kind of performance is best case scenario? Oh and I look at the clock and see it is almost December. December means presents and fast graphics cards to play with? Even with the delay it is still kind of hard to get concrete information out. We don't even have a die size, do we?

    Lastly are we looking at something which has a significant performance per mm^2 improvement or are we just looking at an increase in functional units mainly proportional to the increase in the number of transistors and therefore performance through size?
     
  3. OgrEGT

    Newcomer

    Joined:
    Sep 12, 2010
    Messages:
    31
    Likes Received:
    0
    Speculations are around

    370-400mm2
    >2.5 billion Transistors (more like 2.7?)
    850-900MHz
    appr. 2.6TFlops
    appr. 230W Power consumption (games)
    Between
    2x15 or 3x10 SIMDs a 16 x 4D VLIWs / 120 TMUs
    and
    2x12 or 3x8 SIMDs a 16 x 4D VLIWs / 96 TMUs

    30-50% more performance than HD6870

    Edit: Sounds still nice for me :)
     
  4. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,406
    Likes Received:
    416
    according the leaked slides the SIMDs should be in two groups, so from 2x12 to 2x...:smile:
     
  5. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    874
    Likes Received:
    205
    Location:
    'Zona
    ~3.2-3.5TFlops.

    Best case would technically be ~1.8-2x the performance of 6870, depending on exact specs, but as we know that won't translate to realworld performance increase so 1.4-1.6x would be more reasonable in the majority of cases.
     
  6. Mianca

    Regular

    Joined:
    Aug 7, 2010
    Messages:
    333
    Likes Received:
    19
    @OgrEGT:

    30-50% faster than HD 6870 seems rather conservative.

    Given that it's a new and improved architecture, you'd expect perf/mm² to stay at least at the same level as Barts.

    I really don't see a next-gen chip that's supposed to be ~50% bigger than Barts achieving less than 50% increase in overall performance ...

    HD 6970 will most likely be somewhere between 50-60% faster (in general) than HD 6870 - and thus end up ~5-10% faster (in general) than GTX580.

    DX11 performance could/should see an even bigger jump in relative performance. 3DMark11 will surely yield some very interesting results in that respect.
     
  7. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,406
    Likes Received:
    416
    I don't think so. Barts doesn't support DP, Cayman does. That are additional transistors, which won't be utilized in rendering. Another thing is the dual-geometry engine - it also consumes transistors, but it won't impact performance in many games (because majority of games isn't limited by geometry performance). The 4D thing seems to be also targeted to HPC (better DP:SP ratio), functionality transfered from T-unit to X/Y/Z/W is also very HPC oriented... it costs transistors, which won't be utilized in 3D. I'd be very surprised, if Cayman brings better performance/transistors than Barts (at the same clock, of course), because it appears to me, that this GPU is oriented to achieve best HPC performance per transistor - not the best 3D performance per transistor (that was Barts job). And the difference in this aspect seems to be significantly higher than between Cypress and Juniper.
     
  8. OgrEGT

    Newcomer

    Joined:
    Sep 12, 2010
    Messages:
    31
    Likes Received:
    0
    3.3TFlops if 1920VLIWs at 100% Utilization at 850MHz.

    Edit: Earlier, Gipsel suggested some 3.2 of 4 so 80% of that.
     
  9. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    What about drivers? These won't be as well optimised for the new architecture, so we can expect some 10-20% better performance during first 6 months compared to launch.
    Of course I bet the "popular" benchmarks will heavily optimised at launch but still ...
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,458
    Likes Received:
    1,817
    Location:
    London
    One of the slides that was leaked says:

    Upgraded Render Back-Ends
    • Coalescing of write ops [thought that was already in there :shock:]
    • 16-bit integer (snorm/unorm) ops are 2x faster
    • 32-bit FP (single/double component) ops are 2x-4x faster
    Here:

    http://www.hardware.fr/articles/806-4/dossier-nvidia-geforce-gtx-580-sli.html

    we can see that single-component fp32 fillrate is half speed on HD5870. So I guess that'll become full speed.

    Blending might be 4x faster? Is there much need for blending of fp32 single-/dual-channel pixels though?

    Or perhaps dual-component fp32 fillrate will be 4x faster than it currently is (however fast that is).

    The EQAA modes with the extra coverage samples would appear to be partly dependent upon blending speeds, so perhaps blending speeds are boosted in the relevant places to make performance adequate here.
     
  11. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    Thanks. You could already say that :D Which tool do you use to set this "fractional values" to LOD? How this translate to DX/OGL tweaks in ATT (LOD range: -10>-<10)


    I'd say you pull up wrong conclusion. 6Gbps usually should mean exactly that, 750MHz(x8), and similarly to 5Gbps chips used on HD5700/5800 series easily could be raised up to 5.6Gbps, this 6Gbps puppies should have similar 66.6-75MHz(x8) overhead. And as official AMD card specs go they could declare them as 1450M(x4) parts (or as i'd correctedly put it 725M(x8)) just lowering clock 25MHz

    And as spec(ulations) goes i'd agree Shtal is wrong :D and here's my compiled wet dreams :p
    6990 (XTX) 775MHz 3840SPs 6.0GFlops (310W)
    6970 (XT) 1025M 1920SPs 4.0GFlops (232W)
    6950 (Pro) 875M 1536SPs 2.7GFlops (188W)
     
  12. cenit

    Newcomer

    Joined:
    Jan 20, 2010
    Messages:
    1
    Likes Received:
    0
    I hope you're wet dreaming TFlops, not GFlops...
     
  13. wishiknew

    Regular

    Joined:
    May 19, 2004
    Messages:
    341
    Likes Received:
    9
    Those EQAA modes, is that basically AMD's version of CSAA?

    And no slides of increased performance per watt or area except for that VLIW4 slide. No more architectural magic left in this round?
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    I don't know if there's really much need for that but imho it makes a lot of sense. Currently dual-channel fp32 and single-channel fp32 blending is performed at the same speed as quad-channel fp32 (well outside of memory bandwidth requirements), at 1/4 the rate of 8bit int blending. Clearly, faster quad-channel fp32 blending wouldn't be helpful (there's not enough memory bandwidth even at quarter rate already...), but this means that for 1-channel fp32 blending the hw currently apparently uses only 1 of the 4 blend units of a ROP, the rest are just idling. So by using all of them (just need to feed 4 consecutive pixels to the 4 rgba blend units) single-channel fp32 blending performance should increase by a factor of 4 (well not quite it will hit memory bandwidth limits) and dual-channel fp32 blending by a factor of 2, with minimal hardware changes.(nvidia is already doing this for a while now.)
     
  15. PSU-failure

    Newcomer

    Joined:
    May 3, 2007
    Messages:
    249
    Likes Received:
    0
    Even considering bandwidth constraint, it could be quite a good improvement in some pathological cases.
     
  16. Mianca

    Regular

    Joined:
    Aug 7, 2010
    Messages:
    333
    Likes Received:
    19
    With GTX570 rumored to launch on December 7th, it sure would be a nice move to launch HD 69** just one day before that :twisted:
     
  17. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,406
    Likes Received:
    416
    Since Fermi...? I think GT200 didn't support it.
     
  18. Thalb

    Newcomer

    Joined:
    Jan 1, 2010
    Messages:
    8
    Likes Received:
    0
    This last sentence is trivial, as Juniper was just a Cypress cut in half, and with some GPGPU functionality removed to save space. So if Caiman is anything else than Barts X2 (and I bet it is!), it has to be more fundamentally different from Barts than Capress was from Juniper.

    In rest, I agree with your post. But don't forget that Caiman has a different target market than Barts:
    - Caiman is intended for the enthusiast (hence the x9xx designation), who does care little about price, power consumption, etc. All that matters is to perform better than the competition
    - Barts is intended for gamers, who prefer to spend no more than 200$ on a graphics card, but do not care about GPGPU functionality, single-GPU performance crown etc, as long as they can play the newest games without stuttering. The point about the x850 is to get 80% of the performance of the top GPU at 50% the price.
     
  19. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    GT200 worked the same.
    http://www.hardware.fr/articles/787-7/dossier-nvidia-geforce-gtx-480-470.html
    Note though that nvidia doesn't have 1/4 4-channel fp32 blend. It actually looks like 1/4 per channel, so with 4-channel fp32 blend you get 1/16 of the int8 blend rate - same for gt200 and gf100 (but for gf100 it appears as more than 1/16 because int8 blend is limited by the 64bit per clock export limit of the SMs).
    Obviously, that would give Cayman a huge advantage over GF110 there. But I don't know if fp32 blending is really used anywhere - probably not...
     
  20. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    942
    Likes Received:
    402
    OpenCL image extension?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...