AMD: R7xx Speculation

Discussion in 'Architecture and Products' started by Unknown Soldier, May 18, 2007.

Thread Status:
Not open for further replies.
  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    OT but I at least see an edit function for all my past posts.
     
  2. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    You need to rack up a few posts before you can start covering your tracks willy-nilly.
     
  3. aca

    aca
    Newcomer

    Joined:
    May 4, 2007
    Messages:
    44
    Likes Received:
    0
    Location:
    Delft, The Netherlands
    More transistors per unit area means less area to make mistakes (mask/etching/doping/..). Considering the same sigma's for these mentioned, the statement seems to hold. Purely a stochastic process, I would say.
     
  4. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    I have a feeling the arc. is even longer instead of wider. Perhaps the addition of two more quads in each SIMD array with two more texture blocks.

    RV670 --- 16x4 into 4 texture blocks
    RV770 --- 24x4 into 6 texture blocks

    Because I think the SIMD arrays will stay at 4, I don't think we will see anything higher than 4 render back ends. I guess I will wait and see though.
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I was thinking the same a while back and I have to admit despite my subsequent thoughts ("4xRV635" with 32 TUs) I'm biased towards 24 TUs simply because it's not such a huge jump. But the "asymmetry" of 24 TUs does bother me a bit...

    Jawed
     
  6. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    My thoughts is exactly 24 TU's and yes, long back I did also think that it was 4xRV635 however, performance rumours and die size are leaning me to think otherwise and after some thought, this looks much more appropriate. But I'm not sure what you mean by "asymmetry" problem. :eek:

    The 1 to 4 ALU:TEX ratio is kept in RV770 because you still have one texture block feeding into four quads across four SIMD's. Just simply add two more "levels".

    S = SPU quad T = Texture block R = render back end

    S S S S - - - T
    S S S S - - - T
    S S S S - - - T
    S S S S - - - T
    S S S S - - - T
    S S S S - - - T
    : : : : :
    R R R R

    I also believe that 24 TU's is plenty enough considering clocks should be higher as well. Guessing 900mhz for RV770 to hit target performance.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Ooh, let me draw some ASCII art!

    Code:
    SS  SS  SS  SS --TT
    SS  SS  SS  SS --TT
    SS  SS  SS  SS --TT
    R   R   R   R
    Yeah I've got nothing, just figured I'd like it better without the 96 element granularity.

    I can't remember if the ROPs were each statically tied to specific SIMDs or not.
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I don't know how the ring stops would be organised. But, then again, does it matter?

    I'm assuming that R600 has one ring stop per quad RBE. Then each SIMD also has one ring stop. Finally, each SIMD has one quad TU, implying to me that a TU is directly linked to this same ring stop (since the TU needs to send its results to all the other SIMDs, not just its local SIMD).

    6 quads of TUs sorta implies 6 ring stops...

    Also, in R600, each SIMD has an equal share (1/4) of the TUs. How do you share 6 quad TUs across 4 SIMDs? 2 SIMDs with 2 quad TUs and 2 SIMDs with 1 quad TU. Hmm.

    This asymmetry is why I diagrammed 4xRV635, otherwise it just seems messy:

    http://forum.beyond3d.com/showpost.php?p=1130755&postcount=649

    Now it's worth pointing out that L2 in R600 is centralised. So all TUs (or their L1s, at least) are accessing the same L2. That implied path, which isn't via the ring bus, could imply that the TUs actually return results back to the requesting SIMD via a central route, not the ring bus. If so, that would mean that 6 ring stops wouldn't be needed. But it still leaves me puzzling over the "ownership" of TUs, normally something that's symmetric across all SIMDs.

    Jawed
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    It's a question of screen-space tiling, first deployed in R300, solely for pixel shading: once a batch of pixels is rasterised (determined by their screen space tile) they're localised to a single quad RBE.

    So I've been assuming that this is still the case going forwards, i.e. a strict 1:1 relationship.

    It's possible to argue that with a centralised L2 screen-space tiling has less benefit (because the tiling also helped texture cache coherency). But there's still a question of hierarchical-Z and hierarchical-stencil both being tiled, implying a 1:1 link between those tiles of data and their owning RBEs.

    So, the question is, does screen-space tiling have a locality benefit for RBEs in R600/R7xx? Or is the memory system of R6xx so flexible that it doesn't really matter?

    Jawed
     
  10. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX

    Ahhh thanks Jawed. I can see what you mean.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    That sounds like it might be tiled that way.

    Maybe I'm being dense, but why would it matter to the RBE which SIMD it was talking to?
    Every form of storage in the R600 diagrams is located everywhere but the SIMDs.
    The register file cache, the schedulers, the unified L2 and single-image L1 seem to lean towards keeping the ALUs isolated from the particulars.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Historically it was about load-balancing and cache (for each of TU and RBE) coherence. The load balancing was "automatic", the ALUs were only shading pixels and the assumption was that the (statically defined) set of tiles assigned to each RBE would all amount to an "equal" workload per frame, for ALUs, TUs and RBEs.

    If you junk this kind of tiling then every RBE has to be able to talk to all the ALUs and all of the hierarchical buffer units (for Z/stencil). Clearly the ring bus looks like it would trivially support this many-to-many organisation. And the hierarchical buffer units may in fact be a single unit.

    It's worth remembering that R600 only rasterises 16 pixels per clock, so it's not as if the hierarchical buffer unit would be straining - provided it can "switch tile" it's working against without stalling. The same goes for the Interpolators (which is a "fixed function" block between the rasteriser and ALUs).

    The biggest outstanding question is that the RBEs will all try to access the hierarchical buffers simultaneously. So, are those buffers (Z and stencil) tiled or can a single instance of each support the kind of throughput the RBEs demand? With tiling you guarantee collision-free accesses. Without tiling you have to have some kind of queueing/buffering/re-ordering front end to keep the RBEs happy. I think it's reasonable to assume that the RBEs are the most tetchy when it comes to being told to wait.

    I see it as a question of whether the advantages that were once gained with the screen-space tiling are now relevant. And how much collision management are you willing to indulge in, in order to have as much many-to-many flexibility as possible.

    Clearly, for example, the TUs manage collisions because they support multiple clients - whereas historically TUs only had a single client. That change came with Xenos and RV530.

    With unification of the ALUs screen-space tiling has less impact, agreed. But pixel shading is still typically 80%, say, of the average frame workload and it seems that RBEs are, more and more, taking up a greater proportion of memory system bandwidth (explosion in shadowing and MRT-based algorithms).

    Jawed
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I just want to add that it's entirely possible to build a single hierarchical-buffer update unit (i.e. for early-Z culling/updating) which accesses tiled Z/stencil buffers. This would enable the RBEs to have privately tiled Z/stencil buffers.

    So really it's a question of whether the RBEs have private tiles or whether it's a many-to-many configuration.

    Jawed
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    No, the SIMD units do not have quad TU's attached - those must be independent (try to make the unit count match up otherwise with rv630...). So just have 24-wide simd arrays works perfectly fine from that point of view and naturally gives you 6 quad-tus. It would definitely be the most easy way to go from rv670 to rv770, assuming the 480 shader unit number is correct. The most obvious downside would be that the branching granularity would increase.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    OK that would be like a "super-sized" Xenos, 24-wide units instead of 16-wide. I haven't thought of it in those terms, but it does sound feasible. It would certainly make me happier about redundancy as I really dislike the idea of SIMDs narrower than 16, it just seems relatively wasteful.

    Jawed
     
  16. turtle

    Regular

    Joined:
    Aug 20, 2005
    Messages:
    279
    Likes Received:
    8
    Core: AMD/ATI RV770
    Die size: ~250mm square
    Production: TSMC 55nm
    Silicon Revision: A11
    Shader core: 160! x 5D (800! SP)
    TMU: 32
    ROP: 16/32 (?)
    MC: 256-bit External, Ring bus
    Frequency: 825~875 (xx70), 700~775 (xx50)

    [​IMG]

    http://bbs.chiphell.com/viewthread.php?tid=17621&extra=page=1

    160 shaders? doesn't seem possible. :!:
     
    #756 turtle, Mar 8, 2008
    Last edited by a moderator: Mar 8, 2008
  17. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
  18. turtle

    Regular

    Joined:
    Aug 20, 2005
    Messages:
    279
    Likes Received:
    8
    I know, right? :razz: It does look bullshitty.

    edit: While the picture looks crappy and may be chopped, the specs match what was posted earlier, just with more detail. The earlier post said a little more than twice the shaders, and this says 160, or 2.5x...so to me it sounds like we're starting to get corroborating posts...or maybe just guesses based on the earlier post. I never know with Chiphell, but I try to keep track of the rep of certain posters there. This one is a mod, so you'd think they'd post less dung, but I suppose one never knows.

    The strange thing about it though, wouldn't we be right back to ROP/TMU limitation? I wonder if such a massive increase in shaders would be to help with AA because of the way it's done in the R600/R700 arch?
     
    #758 turtle, Mar 8, 2008
    Last edited by a moderator: Mar 8, 2008
  19. Disharmonic

    Newcomer

    Joined:
    Mar 8, 2008
    Messages:
    4
    Likes Received:
    0
    Location:
    Greece
    The pictures are certainly the same. Its a really poor photoshop edit.
     
  20. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    LOL, that doesn't even look remotely realistic. Anyone that falls for that deserves to be tricked.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...