AMD: R9xx Speculation

Discussion in 'Architecture and Products' started by Lukfi, Oct 5, 2009.

  1. MarkoIt

    Regular

    Joined:
    Mar 1, 2007
    Messages:
    392
    Likes Received:
    0
    One RPE could be also 8 SIMD each one with 80sp with a quad TMU attached each.

    Oh, what about MC?
    Caicos for sure 128bit
    Barts for sure 256bit
    But Cayman? 256 or 384bit? too me it's seems to read a 48 in Cayman ROPs
    Edit: i forgot about the card picture with 8 chips.. but maybe it was a fake!

    With those specs, Cayman should on paper 50% faster than Bart.
     
    #2441 MarkoIt, Sep 29, 2010
    Last edited by a moderator: Sep 29, 2010
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    One solution to the SIMD<->TMU mapping problem is if the TMUs are actually with the ROPs alongside the memory controllers. Cayman appears to have 32 ROPs. Could it have 64 TMUs?

    In RV770 the TMUs and LDS were located together and LDS and TMUs seemed to share data paths (or at the very least timings). With Evergreen, LDS became independent.

    After that, you could argue the TMUs could move anywhere.

    The problem with this, though, is the vast bandwidth that's needed from TMUs to SIMDs. This would be a step in the wrong direction in comparison with Evergreen. On the other hand, these data paths need to exist for colour buffer writes (and other export functions) and also for global atomics.

    So putting the TMUs near the MCs with the ROPs would mean TMUs and ROPs are sharing a bus to the SIMDs. Additionally, since all SIMDs need to talk to all MCs/L2s/atomics, the TMUs would end up being shared globally too.

    So, ahem, what kind of bus/crossbar is going to do that :shock: Ring bus 2.0? :lol:
     
  3. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Ring-bus seems fine, but with that many clients on it, could the round-trip latency become an issue?
     
  4. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    Isn't this slide too ugly to be real?
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Intel seems to think a ring bus is just fine for 32 CPUs on Larrabee, for what it's worth…
     
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    The blurred number suggests 96 (the white blur is bit weaker on lower left and upper right corners, consistent on how 9 and 6 should look like blurred up)
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It's fine for up to 16.
    For 32, there are two linked rings.
     
  8. racca

    Newcomer

    Joined:
    Apr 3, 2010
    Messages:
    51
    Likes Received:
    0
    You said nothing about mid/high range. XX70 fits ALL. Your statement is false, period.
    If you said X870/X770, I would have agreed, but you didn't.

    I didn't say I was expecting that, did I?
    It's true that I think 6850/6870 naming scheme is a bit silly, but it doesn't mean I'm expecting what others have said.
    IMHO, BartsXT would be better off with a 6830 label on it, alongside BartsPRO as 6770, and possibly a Cypress (1280SP/32ROP/~700MHz) as 6750/6730.

    AMD would have a chance to rebrand 5000 series (Cypress and Juniper with different specs) entirely as a stop-gap solution. And devote more man-hour to 28nm full-fledged NI family instead.
     
  9. racca

    Newcomer

    Joined:
    Apr 3, 2010
    Messages:
    51
    Likes Received:
    0
    So in essence, a decoupled TMU cluster per ROP/RBE block or per SIMD block? (with 96 TMUs, the latter would be more likely to be true)

    Well I'm sure if AMD decide to use it this time. They won't make the same mistake all over again.

    Not if you can do it right. with more filter functions moved to ALUs. Shared TMU cluster might be just the right solution.
     
  10. GZ007

    Regular

    Joined:
    Jan 22, 2010
    Messages:
    416
    Likes Received:
    0
    Some TMU sharing could have meaning. Not all pixels need the same fixed ALU/TEX ratio and bandwith. Some pixels can have several high resolution textures while others none. Fixed ALU/TEX ratio can help in theretical texel rate benchmarks but in real games if u could watch each pixels rendering time in a single second than there could be a lot of diferences.(and also botleencks from other parts of the gpu). So the TMU disadvantage is gone after a second of rendering (gtx480 vs 5800).
     
  11. DeF

    DeF
    Newcomer

    Joined:
    May 3, 2007
    Messages:
    162
    Likes Received:
    20
    In the pic above this is what i see for Cayman:
    480(x4)
    96
    32
    3

    I am wondering what would be bart's and cayman's die sizes with those specs.
     
  12. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    You could notice, that we're discussing future midrange / high-end for weeks. There's no reason to imply in every single post, that the discussion isn't related to low-end :wink:
     
  13. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    While I find all the speculation about resurrecting yesteryears concepts very interesting, allow me to point one thing out: Quite likely Islands development has started after AMD took over and maybe already with fusion concepts in mind. So maybe we need to take more possibilities into account?
     
  14. flopper

    Newcomer

    Joined:
    Nov 10, 2006
    Messages:
    150
    Likes Received:
    6
    so amd implemented several smaller cores already?
     
  15. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    If Fusion was anything similar to modular, it would be relatively "easy" to insert an updated DX11 core into an existing design, right? Something like UVD3, a N.I. feature that's also available on Fusion suggests that.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I think TMUs are probably 96, making them local to the RPEs, and not globally shared. Much as I'd like to see the TMUs and ROPs sharing L1 and some ALUs, I don't think it's gonna happen here.
     
  17. racca

    Newcomer

    Joined:
    Apr 3, 2010
    Messages:
    51
    Likes Received:
    0
    If said speculation were true, ie. improvements over TMU-sharing/setup/rasterizer/$, and 4D is quite close to 5D in terms of throughput, then perhaps Barts can beat Cypress clock for clock after all.
    Not quite justifying the 6870 name, but it's a start.
     
  18. racca

    Newcomer

    Joined:
    Apr 3, 2010
    Messages:
    51
    Likes Received:
    0
    No. We are discussing (damn near yet unclear) future architecture for weeks. High/mid end parts get more attention for sure, but you can rule anything out.
    Plus i listed many firsts back there, who's to say 6800 isn't gonna be the next?
    Say, if Barts were to be named 6800, that's got to be a first anyway, isn't it? AMD would not be following their "tradition", hence your argument has no ground.
    And BTW you don't have to specify in every post, because most of the posts either can apply to mid to low end parts or have a code/product name in it.:cool:

    So let's just stop here, alright.
     
  19. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Funny thing is given the shape of the first number im tempted to think that if its true then it is showing 640 rather than 480 stream processors. Perhaps this is a nod to their professional / HPC markets in that they are giving the one SKU a large number of stream processors because it is relevant to these markets as well. Barts doesn't have to cross into the same markets and therefore can stick with a more balanced architecture.
     
  20. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well, if that's really 320x4 shaders, why not - after all the chip already seems to have the same ROP capabilities as Cypress.
    But rumors still are conflicting, and I haven't seen some credible die size numbers neither - that should possibly give some more indication what performance might be expected.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...