AMD: R9xx Speculation

Discussion in 'Architecture and Products' started by Lukfi, Oct 5, 2009.

  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
  2. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    It´s funny the work Dave Bauman´s site gives to himself!. I suspect he would be willing to make the review of this card if he wasn´t its product manager!! ;)
     
  3. CRoland

    Newcomer

    Joined:
    Jan 19, 2010
    Messages:
    114
    Likes Received:
    0
    I don't see why they'd go higher* than 1:4 DP:SP ratio. What good is it except for marketing? I would prefer increasing SP throughput with DP naturally increasing at the same time.

    * Closer to one.
     
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    As far as I know, Dave Baumann hasn't been affiliated with Beyond3D since he moved to work at ATI/AMD
     
  5. PSU-failure

    Newcomer

    Joined:
    May 3, 2007
    Messages:
    249
    Likes Received:
    0
    For GPGPU, it would mean a considerable lead (>=1.5TFlops).

    As for the 1:2 DP MAD/FMA ratio, with 1:2 ADD and 1:2 MUL DP throughput there's no reason to limit MAD/FMA throughput to 1:4 since "simple" optimisation gives 1:2 rate for free.
     
  6. CRoland

    Newcomer

    Joined:
    Jan 19, 2010
    Messages:
    114
    Likes Received:
    0
    But it would seem likely to me that going from, say, 1 DP, 2 SP TFLOPS to 1 DP, 4 SP TFLOPS would also not require that much extra hardware.

    Edit: Except if they are bandwidth limited...
     
  7. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    805
    Likes Received:
    1,635
  8. Mize

    Mize 3dfx Fan
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,079
    Likes Received:
    1,149
    Location:
    Cincinnati, Ohio USA
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    No mention of a cache hierarchy is odd.

    Undecided specs even now is weird.
     
  10. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Beside that Cayman will do only ADDs with 1:2 ratio and MUL/MAD/FMA with 1:4. That's most probably just an error in the slide which got carried over from the Cypress presentation (which had the same misleading "2 64 Bit ADD or MUL" in it but only for add it is true).
     
    #5270 Gipsel, Nov 22, 2010
    Last edited by a moderator: Nov 22, 2010
  11. ferro

    Newcomer

    Joined:
    Apr 8, 2005
    Messages:
    130
    Likes Received:
    0
    Location:
    The Netherlands
    I think they are just undisclosed.
     
  12. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    805
    Likes Received:
    1,635
    More thoughts - 50% more simds require 50% more of bandwidth, but from slides there are still four slices of L2, so or data paths of slices should be twice as wide or chip will be even more cache constrained. Also considering 512kb L2 capacity there will be more of texture fetch misses compare to rv870, which automatically means underutilization of SM's(i.e. utilization of simd could be worse in heavy texture fetch shaders like parallax occlusion, sun shafts, shadows filtering and ect)
     
  13. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Judging by this slide, the L2 cache is still read-only. :roll:
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    That model is where kernel execution fills all available SIMDs. The easiest way to think about this is when a kernel is "ending", i.e. as SIMDs finish off their final threads for kernel A, they become available to start work on kernel B.

    This is queued overlapped-execution.
    This model launches multiple (prolly only 2 per SIMD) kernels regardless of the occupation of a SIMD by any other kernel. i.e. two compute kernels could both fill all SIMDs. Here kernels A and B can be launched independently.

    B doesn't have to wait for free SIMDs - B isn't waiting for A to give it breathing room.

    This is task parallelism (though presumably restricted to some unknown number of distinct kernels).

    I've never seen any statement by AMD as to the number of concurrent kernels supported in Evergreen (merely 2 across the entire GPU?) and I don't see any statement for this new feature, either.

    If you want to criticise AMD for making the comparison of Evergreen with Fermi, I'll join in, just as soon as I know the constraints of Evergreen's. I don't though - but I do suspect Fermi is more finely-grained. But asynchronous launch is, if they're using the term correctly, a step forward from what's seen in Fermi.

    Though I doubt it allows more than two kernels per SIMD - because management of GPR allocation gets seriously tricky with 3 due to fragmentation. Even with register spill through caches into global memory 3 is going to be tricky - and I suspect Cayman doesn't have cached register spill like Fermi.
     
  15. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    So how come that HD5000/6000 series shows noticable texture-shimmering in some games while any Geforce shows little to none even on the Q setting? Is it a hardware limitation then?

    With all this raw power, why can't modern Radeon cards filter as clean as possible, providing a smooth calm image? R520/580 did way better in this area, and your main competitor has been offering superb AF-Quality without any "compromises" since 2006...
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    I get the impression things aren't going to change much based on the current (lack of) architectural change. Unless there's some as yet undisclosed magic it'll probably be similar performance to the 580 with lower die size, power consumption and hopefully cost. The story with geometry doesnt seem to have changed either.
     
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    There was one theory posted in the HD5 AF broken -thread, aka AMD/ATI using more detailed LOD values by default, by "softening" the LOD by +0.65, the shimmering disappears on Radeons - incidently, then "sharpening" the LOD by -0.65, the shimmering appears on GeForces
     
  18. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    Doesnt it say 2 polygons per clock now? Versus 1 prior?
     
  19. boxleitnerb

    Regular

    Joined:
    Aug 27, 2004
    Messages:
    407
    Likes Received:
    0
    Well, in my opinion, an IHV has no business adjusting the LOD. If the user or an application requests it, fine. Everything else is just inscrutable.
    Normally, the LOD should stay at 0, right?
     
  20. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    Of course - I don't know much about that stuff though, so the next question is, is there a fixed 0, or is it determined by the hardware?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...