AMD: R9xx Speculation

Discussion in 'Architecture and Products' started by Lukfi, Oct 5, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Work-item vectorisation can also improve memory system access patterns, e.g. by increasing the coherency of cache use by making a longer burst.

    The problem with these chips is you're programming the memory system as much as you're programming the ALUs.
     
  2. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    And that would still fit the original rumor that AMD's refresh is faster than NV's possible current high-end.
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    It might be a little easier. At least you can select using bit masks on AVX.
     
  4. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,247
    Likes Received:
    4,465
    Location:
    Finland
  5. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    You mean the supposed scaling to 1280 ALUs, 80 TMUs and 256 Bit memory interface on the same process node? If that was true, I'd have to agree.

    But then, there are other possibilities for AMD to refresh than just add more of the same stuff, especially when 40nm wafer space still is quite rare, I'd assume it to be wise, not to scale your whole lineup of chips by +50mm² (for the sake of the argument).
     
  6. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    Both Samsung and Hynix still list 6.0GHz modules as their fastest products...
     
  7. Space Giraffe

    Newcomer

    Joined:
    Jun 3, 2010
    Messages:
    16
    Likes Received:
    0
    7 Gbps GDDR5 went into mass production back in June IIRC. Is it soon enough for ATI to start using them?
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Two issues: Vantage seems to flatter ATI currently and 128-bit is still in play as an option for Barts.

    Yeah, reinstating stuff that was cut from Evergreen.
     
  9. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    I believe it was originally planned as a 128bit part (at 32nm), not that it still is a 128bit part. Expecting die size around 260-270mm², it wouldn't be clever not to utilize it for 256bit interface...
     
  10. TKK

    TKK
    Newcomer

    Joined:
    Jan 12, 2010
    Messages:
    148
    Likes Received:
    0
    True, but there's also this.

    Maybe Hynix doesn't list that memory anywhere else yet because AMD is their exclusive cutomer for that memory right now. The time-frame surely fits, that memory was supposed to be in mass-production for some time now.

    Of course it's also possible that this is just a test sample and the final 6870 will come with a lower memory clock.
     
  11. ferro

    Newcomer

    Joined:
    Apr 8, 2005
    Messages:
    130
    Likes Received:
    0
    Location:
    The Netherlands
    Nice. This particular module would give you 2GB on a 256 bit bus. Eyefinity-ready!
     
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,062
    Likes Received:
    3,119
    Location:
    New York
    Oh ok, now I understand what you're referring to.
     
  13. caveman-jim

    Regular

    Joined:
    Sep 19, 2005
    Messages:
    305
    Likes Received:
    0
    Location:
    Austin, TX
    Can you expound on why you think that? Link me to previous discussion if you've stated it already and I missed it. Thanks
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Anandtech had a story about Cypress where AMD basically said something to that effect. The chip had to be pared down to meet size requirements. Density suffered due to additional measures taken to increase yields and counteract process issues.

    One bit of curiousity is whether or not Northern Islands will be as aggressive in implementing such measures, assuming TSMC was able to remedy the problems that made AMD bloat Cypress in the first place.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    As 3dilettante said.
     
  16. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    The GPU-Z part seems right for a unrecognized GPU. Then we have the Vantage numbers:
    GPU Test 1: 37.53
    GPU Test 2: 30.50
    CPU Test 1: 3682.88
    CPU Test 2: 31.55
    GPU Score: 11634
    CPU Score: 25839
    3DMark Score: X11963

    The numbers add up correctly according to the Vantage score calculation, so no obvious fake that way.
    But does the ratio between the 2 GPU tests look sufficiently far from something we know, like a high clocked 480 or 5870? (of course the cpu may also influence that ratio, so the cpu scores should be comparable).

    Is it? The Ati-forum piece looks very much like it's based on PCB blueprints, so things like memory bus ("pin compatible to 5800") and power envelope (2*6 pin) seems pretty certain.
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    2 extra layers, for what? If they're both 1GB 256-bit boards, why are there two extra layers?
     
  18. caveman-jim

    Regular

    Joined:
    Sep 19, 2005
    Messages:
    305
    Likes Received:
    0
    Location:
    Austin, TX
    Thanks!

    I'm sure they will be as aggressive with the Product Requirement Specification as they were with Cypress, because they don't have a process shrink to help with power consumption and die size. Assuming of course it's a continuing evolution of TeraScale 2. More SIMD means more transistors, means more heat, means more die, means lower yeilds... unless you've got a bunch of process engineering tricks up your sleeve to make the most of TSMC 40nm.

    Perf/watt is a key marker for sales to OEM these days and thats where AMD's focus appears to be.
     
  19. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    And for short vectors it's easy to do the same in software.
    I've done that already.
    Basically it looks like manual loop unrolling with a handling of the special cases for the possible divergences. One can handle that often efficiently with conditional moves for short divergences, or with normal control structures which has effectively the same charactistics as the lane masking.
    You bloat the code but get quite some speedup (if the divergences don't dominate, but in that case GPUs suck either way).
    No, as the normal vectorization in GPUs is implicit, you don't handle that part at all (besides when dealing with shared memory). You simply add vectorization, that's it. It's roughly the same as using SSE intrinsics, only more flexible.
    Obviously I'm not reading your or nvidias books. A factor of 2 often decides if something is feasible or not. And even a 50% speedup is nothing to sneeze at in my book. If you have to write something from the ground up either way (which is the case for GPGPU much more often than not) it is definitely not an unsurmountable task to get it implemented with relative ease if one has thought about and planned that stuff before.

    Btw., I mentioned that to you before and I will reiterate it, but also nvidia GPUs often gain from explicit vectorization as it reduces the granularity of memory accesses and increases the burst lengths. It is simply more cache friendly and with a lot of algorithms being bandwidth limited, it can be astonishingly efficient for some problems in view of the "scalar" nature of nvidia GPUs.
     
  20. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    My experience with vectorization on Nvidia GPUs has not been positive. The extra register pressure caused by vectorizing code often causes large occupancy losses and ends up significantly harming performance. That's one reason AMD requires larger register files than Nvidia.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...