AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Perhaps that's why it may be called the Bull Shit Network.:grin: Atleast Fudo doesn't hide this and says that his site is FUD-zilla.
     
  2. JoshMST

    Regular

    Joined:
    Sep 2, 2002
    Messages:
    467
    Likes Received:
    25
    That is definitely not a given. While GF is making a lot of noise about what they might be able to do in that future timeframe, they certainly have not proven it so far. Plus 28 nm will still be at least 1 year away, if not 1.5 for GPUs and other large ASICs. We might see a 32 nm part by this time next year. But, GF has a lot to prove when it comes to bulk production. While I am personally expecting them to do well, professionally we have to have significant doubts to if they can deliver as they have promised.
     
  3. Tchock

    Regular

    Joined:
    Mar 4, 2008
    Messages:
    849
    Likes Received:
    2
    Location:
    PVG
    On second thought, why can't Evergreen BE the midrange part? Just because previous ATI midranges were small (and didn't sell that well on the channel compared to their larger nVidia counterparts)?


    If this takes a playbook from the G94 (Geforce 9600GT), aimed on delivering the last "refresh"'s 2nd top tier single GPU performance (Evergreen to 4870/90 as 9600GT to 8800GTS), would that 40mm2 be justified?

    4870+ perf in <4850 TDP too?

    G94 was 240mm^2. ATI might have been on a diesize spree previously, but it could end if they have more demand and buy in larger volumes wrt previous generation. I don't see 180mm2 hampering usability in large ways, even the 9600GT got butche- uh- costdowned to a ridiculous level.
     
  4. Unknown Soldier

    Veteran

    Joined:
    Jul 28, 2002
    Messages:
    4,047
    Likes Received:
    1,670
    I don't remember Nvidia 'claiming' this, I do however remember some websites claiming that Nvidia would be first to market.

    US
     
  5. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    indeed, could only find this on bison.
    (this is from early may after nv's comcal.)
     
  6. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Too much confidence from any side until X or Y sits on a shelf hurts my brain. Anyway what's up with the awkward codename? I mean "Evergreen" :lol: What's NV's X11 chip called internally then? Edelweiss? :twisted:
     
  7. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    I'm going for RotFuß
     
  8. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    I think it's a hidden reference to AMD's green logotype line. :grin:
     
  9. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    Evergreen is the entire DX11 family name according to my source at Computex. He also had some juicy bits about GT300... pushed back to 2010 supposedly.. confirmed by a major AIC. Let me see if I can find out more.
     
  10. Tahir2

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,978
    Likes Received:
    86
    Location:
    Earth
    You know everyone was so excited about the next DX11 part from AMD and then AMD decide to throw a spanner in the works by showing us a relatively tiny die for an assumed high end part.

    When will the drama end? :cry::wink:
     
  11. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    Sounds plausible with rjc's info on a new 40nm upstart. if GT300 required a respin (no boards @computex) they could be delayed long enough to not make it in 2009.
     
  12. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Careful with the interrogation methods. ;)
     
  13. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    yeah ok............................:shock:

    hmm no back to school parts? sure about that? God Charlie can't believe anything he says, ya know nV's 40 nm parts.......
     
  14. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    Your sig is getting scary eh Raz?
     
  15. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    well I'm starting to laugh already :lol:
     
  16. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London


    "2010" :twisted:
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Yes, this is like R600->RV670 - the ALU/TU/RBE counts were unchanged and clocks got bumped by 4%, the bus got chopped in half and the GDDR3 clock was raised by 36%.

    Except if the bus got chopped in half and memory clock was raised, Evergreen would have ~45mm² to fill with stuff. That's a hell of a lot of stuff, since I estimate that RV740's clusters are around 52mm².

    Or, that's 45mm² of D3D11-specific additions :shock:

    Or, that's 45mm² of D3D11 stuff + architectural re-jigging.

    It's conceivable that the architecture needs a shake-up to handle the memory-intensive nature of D3D11.

    It seems to me that D3D11 is making buffers (resources) of indeterminate count and size a more finely-grained component of rendering. Previously rendering consisted of consuming some fixed-size buffers whilst writing to other fixed-size buffers.

    Geometry shading opened a can of worms making the output buffers variably sized. We've seen that ATI currently uses a pair of ring buffers to handle the ebb-and-flow of GS. Now D3D11 gives the developer access to their own arbitrarily sized buffers to be used pretty much whenever they feel like it (PS or CS seem the most likely places, and arguably CS is a distinct rendering pipeline all of its own) - though it seems there is still a hard limit on the population of these buffers bound to the rendering pipeline at any one time.


    So it seems to me that there are now multiple sets of paired ring-buffers along the rendering pipeline. TS ouput is variable, GS output is variable and PS output is now variable. PS output is extremely arduous because:
    • there can be multiple independent variably-sized buffers written by a single pixel shader
    • the in-flight count of pixels is higher than for any other kind of graphics primitive
    In R600 etc. the paired ring buffers rely upon latency-hiding to perform. I've not seen any analysis of GS performance that highlights the quality of latency-hiding - all we have are hints that R600's ridiculous bandwidth was a nod in the direction of making GS work well and that RV670 should show a significant shortfall due to it's much lower bandwidth.

    So in D3D11 chips, is latency-hiding against ring buffers held in memory enough? Can a layer of cache provide any benefit here? Theoretically, while appending or consuming a variably sized buffer, caching works well, since all threads are focused on a single region of the buffer. In some ways this is the ideal scenario for caching and it's much easier than caching render back end tasks such as blending, where a stream of pixels arrives with "random" memory addresses (randomness ameliorated by screen-space tiling, though I don't know whether an entire tile can be held on-die in cache).

    Can caching for append/consume buffers be re-used for RBE tasks? One of the properties of append/consume is that it doesn't "tile" - because all threads are focused on writing to the head. The head will move "slowly" through tiles in memory space, i.e. it'll move slowly through MC channels. This seems like a useful property to me, as it means that the MCs can be easily configured/scheduled to pre-fetch (while consuming the tail) and it means that sizeable burst writes can be done, e.g. the MC performing a single write after a wodge of data has been added to the head. This doesn't sound too different from RBEs holding entire/portions of a screen space tile - though the timing is skewed in favour of append/consume, where the lifetime of a block is much more coherent (bursty).

    L2 in R600 is pretty large, hundreds of KB at least (not massive though). It effectively supports pre-fetching of texels by virtue of both locality and the fact that many texel coordinates are known before pixel shading commences. Much the same applies to append/consume, whereas RBE is pretty much stuck with some degree of randomness. So append/consume seems like it blurs across the functionality of texture and RBE caching, with the strong locality of texels, but the requirement to write.

    Currently ATI effectively supports 128 vec4 registers per pixel (vertex, thread, etc.) before registers have to spill to memory. Is that the limit? It seems likely to me that ATI can't increase the register file without having to re-time instruction issue/execution. Currently the ALU and TEX pipes seem tightly bound to pairs of wavefronts in-flight, with an 8 cycle pipeline and effectively some multiple of that for register reads/reads-after-writes. So it seems to me pretty difficult to tweak the register file (e.g. double it) in order to substantially increase latency hiding or to support shaders with substantially higher register allocations.

    So, maybe register spill needs to become a first class citizen. Register spill appears to be a coarse-grain variety of append/consume. When a wavefront is created its registers are allocated, D3D specifies that 4096 vec4s per pixel are allowed, that's 64KB per pixel. This is, effectively, a contiguous block of data, e.g. 128 registers is 2KB per pixel, so 128KB per wavefront, though it could be made up of smaller blocks. If register spillage is required then blocks of a wavefront's register allocation can be sent to memory. With round-robin scheduling and with the scheduler able to see the progress of a wavefront's antecedents (e.g. texture filtering) it can control the scheduling of fetching-back of those register blocks that were dumped into memory. All of this is bursty and looks amenable to simple stream-through caching.

    So can a single cache take on all these roles? Or are dedicated caches better? Can the high quantities of append/consume traffic opened up by the really quite long and twisty D3D11 rendering pipeline be supported purely by uncached latency-hiding? Is ATI's current latency-hiding at its limit? Can register spill performance penalties be ameliorated by schedulable stream-through caching?

    I'm not trying to suggest that all of the "missing 45mm²" is cache. I'm just wondering if the significant increase in the density of memory operations requires a massive re-wiring of all the major memory clients, perhaps with a new higher level of overview in scheduling and perhaps also in a new level of generality.

    Jawed
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    Due lack of high res evergreen die pictures, here's a lowres one, next to high res RV770

    [​IMG]

    Even though lowres, it's IMO clear that the evergreen has gone through major changes over RV770-style, there's 4 clear "partitions" of the chip, while in RV7xx there's one big pile in the center
     
  19. CJ

    CJ
    Regular

    Joined:
    Apr 28, 2004
    Messages:
    816
    Likes Received:
    40
    Location:
    MSI Europe HQ
    You do realize that AMD flipped the wafer... we're only seeing the backside...
     
  20. kemosabe

    Veteran

    Joined:
    Jun 19, 2003
    Messages:
    1,001
    Likes Received:
    16
    Location:
    Montreal, Canada
    Nice ass...though I prefer them roundish.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...