Huddy says "R600"

Discussion in 'Pre-release GPU Speculation' started by Geo, May 25, 2006.

Thread Status:
Not open for further replies.
  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Well I did go on, for example suggesting that R600 might be constructed with a number of independent arrays. That's what the 3:1 + 3:1 + 3:1 + 3:1 thing was about.

    If that were done, each group of 3 ALU arrays and 1 texture array (3 quads, 1 quad? - just like R580?) would have its own ROP. Pixels would be rasterised and despatched just like we see in R580. The ROPs would still be screen space tiled and the various buffers/caches for the ROPs would retain their current configuration (beefed-up for whatever D3D10 demands).

    Vertices and primitives would be despatched to arbitrary arrays, presumably based upon workload. There wouldn't be any meaning attached to the array used to process vertices and primitives (I can't think of a parallel to "screen space tiling" that might apply to them).

    ---

    I have my fingers crossed that R600 is a 32-ply beastie: 8 shader units, each consisting of 3:1 arrays, each with a ROP. So 32-1-3-2. If you include vertex fetch, then you might say it's 32-2-3-2. Hopefully it will have RV530-style fast ROPs.

    Prolly time to start a thread about render target formats under D3D10. I'm not sure what improvements are coming...

    ---

    If R600 is a D3D10 version of Xenos, with just a big, shared, block of ROPs (and no screen space tiling) then erm I dunno, no good ideas there...

    Jawed
     
  2. PeterAce

    Regular

    Joined:
    Sep 15, 2003
    Messages:
    490
    Likes Received:
    10
    Location:
    UK, Bedfordshire
    So your thinking a batch size of 48.

    My gut-feeling (the current favorite B3D saying) thats it is 64 like Xenos.

    Would this mean that a config more like 6 (16-way) arrays, a batch being 4-cycles making 64. But it also could still be somthing like 12 (8-way) arrays.

    I'm not sure of the trade-offs between a few 'large arrays' or more 'smaller arrays'.

    Anyone care to take a stab?
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I like the multiple shader units configuration because not only does it directly support screen space tiling, it also makes for a nice even distribution of ring-bus clients.

    Jawed
     
  4. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Larger batch/thread sizes vs more arbiters and sequencers (i.e. more control silicon).
     
  5. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    That would be freaking enormous. I was crossing my fingers for 16 ROPs, 96 shader pipes and 32 texture units, and would be happy with even 24 texture units. If it really is 64 shader pipes and 16 texture units, that's a bit of a dissapointment in comparison to Xenos.

    (BTW, I continue to use the term "shader pipes" simply for comparison to this generation's structure, i.e. NVidia has dual issue and ATI has their mini ALU. I haven't seen any evidence where NVidia's pipes perform closer to two ATI pipes than to one, especially when looking at R520 vs. G70 benchmarks and synthetic tests.)
     
  6. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Interesting. That's where I am. Nice to see someone else there.

    Think 80/24 makes any sense? Still > 3/1, but save a little silicon at 80nm.
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Why wouldn't 16 TMUs at anything =/>750MHz be not enough? What's the real theoretical goal with next generation GPGPUs: =/>20 GTexels fillrates or =/>400 GFLOPs of pure arithmetic horsepower?

    Besides frankly there's nothing yet that can tell me whether 64 (replace the number with anything else you want) ALUs are not sufficient. Without knowing the characteristics of those units and some essential architectural tidbits, speculating around with numbers and determine what is sufficient and isn't, is a tad off base IMHO.

    If those 64 ALUs would be dual-issue (and yes that's just a hypothetical example grabbed out of thin air), you'd reach 768 GFLOPs at a 750MHz frequency.
     
  8. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Are they going to make 750mhz at 80nm? I'm not so sure.

    I'm going mainly on the fact that Sireric seemed to think more TMU would be lovely so long as there was more BW to feed it (seems there will with gddr4), and the ratio didn't decrease.

    Edit: I'm beginning to wonder if there is a general clocks issue building in the gpu world. The data points are building up that way, and not just ATI but NV as well. But then I'm widely known to worry too much. :)
     
  9. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Hehe, well I want to do some stuff with spherical harmonics and 3D textures. :wink:

    I figure l=2 is a minimum, and trilinear filtering of 3D textures brings me up to nearly 40 texture unit cycles per pixel. Lots of overdraw too.

    On the math side I think its mostly for non-graphics applications that I'd like that much power.
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Makes some sense, and I thought about it, but given the rumours I doubt it. I hope they aren't right, because staying with 16 texture units is dangerous for ATI, IMO. Especially in the value sector, where reduced graphics settings could be less math heavy.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    How about some deferred texturing with single cycle trilinear TMUs then? :D (j/k)

    I'm not sooo sure we'll see even 32 TMUs in any of the upcoming D3D10 GPUs, but as I said it's merely a gut feeling and not based on any reliable background info.
     
  12. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    968
    Likes Received:
    54
    Location:
    Canada
    Keep in mind that ALUs will be more important as there will be more math based on the addition of geometry shaders, physics, and predicated rendering.

    Additional transistors will be used for virtual memory and management of graphics objects.
     
    #112 rwolf, May 30, 2006
    Last edited by a moderator: May 30, 2006
  13. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    R600 texture units != Xenos texture units.
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Heh, it already is deferred. The overdraw is from alpha blending.

    Basically I want to start seeing neighbourhood transfer of radiance. Even if it's pretty local, I think that's the missing link in realtime lighting that we need for realism. There are a lot of approximations in the method I'm thinking of, but I think it could work.

    Well I was comparing to R5xx texture units, but even if changed from them, how different could they be? Maybe FP16/I32 filtering, more intelligent aniso walking, and better efficiency. Doubt 16 units is enough to effectively counter even a simple G71 x 1.5 w/ DX10.
     
  15. Kombatant

    Regular

    Joined:
    May 29, 2003
    Messages:
    639
    Likes Received:
    19
    Location:
    Milton Keynes, UK
    I don't see a 3:1 ratio myself.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Erm, you're practically agreeing with me, 32 shader units equals:
    • 32 bilinear TMUs
    • 32 vertex fetch units (suppose we might as well start calling em VFUs)
    • 96 ALUs
    • 32 ROPs (hopefully RV530 fast)
    So the only material difference between you and I is ROP count.

    If you have an 8x32-bit GDDR3 interface, then perhaps it makes sense to use 8 quad ROPs, one per ring stop and one ring stop per 32-bit channel.

    R580 has two 32-bit channels per ring stop, hence only 4 ring stops and 4 quad ROPs.

    I'm not going to push too hard for 8 ring stops.

    Jawed
     
  17. Urian

    Regular

    Joined:
    Aug 23, 2003
    Messages:
    622
    Likes Received:
    55
    I believe that they are going to the 32 TMU, 32 ROPS, 32 Z at the beginning with 2 ALU per pipe instead of 3. If they are going to launch it at the end of this year and Vista with DX10 isn´t going to be ready until 2007 I don´t see the possibility of a DirectX 10 GPU and I see more a DX9 GPU with 10 Vertex Shaders 3.0 (more enhaced than the R5x0 architecture) and 64 Pixel Shaders clocked at 700Mhz and with all the ALU being MADD capable.

    Of course this is just speculation.
     
  18. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Given Eric's comments in the R580 interview, if core speeds scale at a similar ratio at memory, why would we expect any more of the elements that consume bandwidth? Take a read of the interview again, carefully, and bear in mind what items are likely to be on his mind when he's replying (given that R580 is a historic item to him at that point).

    Also, note, just because GDDR4 is coming doesn't mean that this will translate into an immediate and massive leap in bandwidth. GDDR3 started at the high point of GDDR2's end - i.e. GDDR2's high point was 500MHz, but 500MHz GDDR3 was far more prevelent; it wouldn't surprise me to see GDDR4 coming in at the 900-1000MHz range initially.
     
  19. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    so, your guess is 16 ROPs/TMU with 64 ALUs ?
    Not exactly what most of us expected.
     
  20. _xxx_

    Banned

    Joined:
    Aug 3, 2004
    Messages:
    5,008
    Likes Received:
    86
    Location:
    Stuttgart, Germany
    Second that.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...