AMD RV770 refresh -> RV790

Discussion in 'Architecture and Products' started by w0mbat, Nov 10, 2008.

  1. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    2560*1600*4bytes/pixel~16MB :shock:, that's way too much. Does xenos renders to it's edram in tiles or does it operate at a lower resolution?

    Perhaps 2 or 3 shrinks later. :)
     
    #341 rpg.314, Feb 15, 2009
    Last edited by a moderator: Feb 15, 2009
  2. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    AFAIK Xenos uses it's eDRAM in 2x2-Viewport-Tiling if either 4xMSAA is enabled or resolution exceeds 720p.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Two AMD 40nm chips have been rumoured. Apart from RV740, one could be RV870 or "RV720". In the first half of this year, though, I think we can discount RV870. And then there's the scarcity of "RV720" rumour.

    IGPs being what they are (glacial), it is definitely possible they are the other 40nm chip not to be seen for most of the year.

    Is there a precedent for GP->GT causing a chip to grow substantially?

    I see VR-Zone has an "update from CJ" attached, which says there'll be an OC version of RV790 bringing 25-30% performance gains. Assuming that's based on clocks and extra clusters, then I suppose 12 clusters would easily fit in 290mm2. 750MHz 12 clusters is 120% faster, at 850MHz that's 36% faster.

    $300 would be the higher-clocked (XT, or OC as VR-Zone is calling it) GPU, but if it's only 10-15% faster than the Pro GPU then pricing does seem screwy, relying solely upon the difference in memory between them.

    Indeed, one way to fit the rumour is that Pro is 12 clusters at 750MHz with 512MB GDDR3 at 975MHz (HD4850 is 993MHz), producing "20% higher performance than HD4870-512MB" only if you count situations when bandwidth isn't a constraint. The XT would then be 825MHz with 900MHz GDDR5, for a supposed "25-30% gain over HD4870-512MB"...

    Jawed
     
  4. KonKort

    Newcomer

    Joined:
    Dec 29, 2008
    Messages:
    89
    Likes Received:
    0
    Location:
    Germany, Ennepetal
  5. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    You're citing yourself as source? :)
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Hmm, ~30% increase in max throughput. Only 20% increase in alu's suggests that power was a constraint so they decided to reduce the number of transistors and adjust by increasing clocks instead. Though I was expecting a larger increase in the alu count, in 40-60% range. :roll:

    In that case, however, the die size should definitely be smaller than 200mm2. I'd guess around 160-180mm2. If they are aiming for a $200/300 price point with it, then it should have great margins compared to rv770. But I doubt if they will attempt a 4890x2 or whatever.
     
  7. KonKort

    Newcomer

    Joined:
    Dec 29, 2008
    Messages:
    89
    Likes Received:
    0
    Location:
    Germany, Ennepetal
    Carsten,

    I know that you understand German. So you can read that the informations are refered to AMD and another, not publish source.
     
  8. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    Accoring to my info: RV790 is Mid and End of April. Both versions of RV740 (9600GT and 9800GT competitors, targetprices ~$119 for 512MB GDDR5, ~$99 for 1GB DDR3, A11 currently clocked at 700Mhz engine) should be in May. But hey, it could be dated already with all the smoke and mirrors AMD have been pulling lately.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Texturing shouldn't be bandwidth constrained generally.

    D3D11 arguably needs a major rejig in on-die memory architecture because pixel shading is allowed to read and write render targets.

    R600 architecture already allows registers to be read from and written to memory locations. It supports this functionality through the memory read/write cache, which should optimise for locality (to a degree, anyway).


    The Sequencer's main duties are:
    • scheduling ALU clauses
    • scheduling TU clauses
    • moving data into/out-of registers
    • fetching constant cache lines
    • controlling branching and manipulating/testing the stack
    The ability to move data into and out of registers (MEM_SCRATCH instructions, according to the R600 ISA document) may well be the key to supporting pixel shader reading/writing of render targets. i.e. simply reserve a register per render target per pixel (or sample).

    Or perhaps a new kind of fetch clause type will be defined in addition to vertex fetch and texture fetch clauses, i.e. pixel fetch. This boils down to how ordering of triangles is handled, because fetching a pixel/sample from a render target must be strictly ordered by triangle for each location.

    If you look at the end of a pixel shader program you'll see a Sequencer export instruction. This specifies the registers that are written to the render targets, i.e. translating from a register location into a memory location - though of course in this situation the pixel has to pass through the RBE that handles that memory location (according to tiling of render targets in memory).

    I'm wondering if all colour blend operations will be performed by adding instructions into the pixel shader to read then blend.

    So, the overall effect of D3D11 pixel reading/writing on the ATI architecture could be fairly minimal. It may be that the register file has to increase in capacity simply to deal with the additional latency that this kind of manipulation generates.

    Jawed
     
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,062
    Likes Received:
    3,119
    Location:
    New York
    RV740 vs 9600GT and 9800GT? Wow, that should be a fun bloodbath.

    What does fixed function blending cost relative to other ROP bits like AA and compression etc? Also would pixel shader blends imply that AA happens there too? And if it does how would AA sample compression work in the shaders? Seems like a lot of stuff would slow down and/or bandwidth efficiency would be lost.
     
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I am referring purely to your posting here. Haven't read HW-Infos in a while.
     
  12. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    The 12 SIMD seems like your own speculation based on "more SIMD" and "+30% overall performance"?
    This would IMHO be a too small chip in 40nm for 256bit gddr5. Sound more likely for the 55GT variant (although I don't believe in a redesign and a more expensive chip for +20% performance, especially not from AMD).
    The 16 SIMD seems more plausible regarding size, however, this would probably be quite RBE limited unless something is done on those. Maybe the 16 SIMD and nothing else would be close to overall +30% actual game performance on the same clock?
    We still don't know a whole lot about the RV740 RBEs - if they could use those in the '90? Do we even have a reliable size estimate on the '40?
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Lots of good stuff :grin:
    Yeah, agreed 12 clusters implies 55nm. But I still think 12 clusters is pure guesswork.

    R580 was a "20%" refresh of R520, ~20% bigger die and about 20% better performance under some conditions around the time of launch.

    But RV670 was merely a major cost-reduction of R600.

    I suspect HD48xx GPUs are a bit short on texturing but with a bit more texturing they'd simply run into a wall with Z fillrate. I wish someone would make a concerted effort to test this stuff.

    RV740 RBEs are a big deal - I can't see how they can make use of ~60GB/s unless there's twice as many as in RV730 or the Z configuration is doubled.

    I think I saw 100mm2 for RV740 rumoured, earlier in this thread? Seems way too low to me - somewhere in the region of 120-130mm2 with all the extra ALUs. I'm assuming that it'll be a 4:1 GPU with 8 clusters.

    Totally wild speculation: the 290mm2 rumour for a 55nm RV790 could allow for 8x Z per colour in the RBEs, since 12 clusters wouldn't take it past 280mm2 :razz:

    Jawed
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Can some body explain to me the meaning of being fillrate limited? Also what is fillrate anyway? Is it the max number of pixels a gpu can push out while running trivial pixel shaders? If yes, then why is it quoted for fixed function gpus?
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,062
    Likes Received:
    3,119
    Location:
    New York
    Depends on the type of fill-rate you're talking about. Vanilla pixel fill-rate is basically how fast the card can write dumb fragments to the frame-buffer. So you're basically flat-shading polygons - no texturing, no shader code. It's a measure of ROP throughput and in practice is significantly bandwidth bound. Why shouldn't it be quoted for FF stuff? Back then the only difference was that the ROP, texturing and shader pipeline were all combined.

    Z-fillrate is an even "dumber" version where you're not writing color but only depth. Some architectures (especially Nvidia's) can accelerate this process by writing many more depth values per clock than they can write color. It's extremely useful for z-only passes in those algorithms that employ them.

    And well, you know what texture fillrate is. Number of pixels * Number of textures per pixel.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Blimey, there's a tricky question, dunno.

    http://msdn.microsoft.com/en-gb/library/bb205120(VS.85).aspx
    http://msdn.microsoft.com/en-gb/library/bb204894(VS.85).aspx
    http://msdn.microsoft.com/en-gb/library/bb204892(VS.85).aspx

    It's just a bit of math :lol: I doubt the ALUs are particularly costly these days, though it's worth noting that full fp32 functionality didn't make it into D3D10, so back in 2005, say, it was a relatively costly bit of math.

    Compression should be separately handled and is essentially a function of the "memory hierarchy" part of the render back end, rather than the graphics-math part.

    Some of the blend modes are min or max - similar to Z testing, basically. Arguably they're similar enough in functionality that they would all move into pixel shading simultaneously.

    Remember AA resolve works fine in R6xx's pixel shaders and custom resolve functionality is something that deferred rendering engines can do.

    Apart from a question of routing (and bandwidth) there's also a question of latency. If pixel-shader colour blending is implemented it increases latency, which inflates the amount of storage space on die given over to holding colour data.

    Arguably handling that state is something that the register file and the out-of-order thread scheduling are perfectly adapted to do - why build a second one in the RBEs?

    Of course as far as Larrabee's concerned, this is all just stuff in L2 to be manipulated :razz:

    (The thought has occurred to me that once Larrabee style GPUs take over, GPU architecture just won't be at all interesting :sad: )

    Jawed
     
  17. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Personally, I think that is inevitable.

    Just my 2 cents.
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,062
    Likes Received:
    3,119
    Location:
    New York
    Yeah no doubt. And it'll be even less interesting than CPU architectures as all the fancy logic for extracting ILP and achieving high single-threaded performance won't be in the picture. We aren't quite there yet though, there's still a lot of work to be done to deal with SIMD divergence.

    In terms of blending in the shaders, it's probably going to be the first thing to move. Nvidia already introduced global memory atomics with CUDA. And DX11 requires more general access to the framebuffer anyway so if you've gotta do all that anyway then why not? I've got no idea what AMD has in store for DX11 but I figure Nvidia is going to invest even more heavily in CUDA next generation than they have to date. How much of that investment serves to improve game performance will determine whether they find themselves in the same perf/mm hole they're in today.
     
  19. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    That was a very basic estimation, which more than one person came up with back in Dec, before we even had any hard info. Still, 100mm2 isn't too far off when a linear shrink of RV770 is right at ~140mm2.
    Remove the sideport, two clusters and it should be pretty close.
    Only have another 1-1.5 months to wait, until we have better info.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Hmm, TSMC's documentation indicates that scaling from 55->45/40nm should be ~0.55x, so on that basis you'd be right. I was under the impression it was more like ~0.67x :???: Can't find the posting that led me astray...

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...