Huddy says "R600"

Discussion in 'Pre-release GPU Speculation' started by Geo, May 25, 2006.

Thread Status:
Not open for further replies.
  1. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    The burst length is increased from a minimum of 4 (as in GDDR3) to 8. This way, they can partition the internal organization into more parallel blocks and run the internal clock at half the speed of BL4. If the internal speed used to be the previous critical path in GDDR3, you suddenly have doubled the cycle time. The critical path probably now becomes the IO pins.
    Since your minimum transaction packet suddenly doubles in size when going from BL4 to BL8, you'll have to work hard to keep rams busy at high efficiency. That requires (more) complex scheduling and large cache sizes.
     
    #641 silent_guy, Oct 8, 2006
    Last edited by a moderator: Oct 8, 2006
  2. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Thanks, now it makes sense!
     
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    BTW, the R580+ article that Uttar pointed to doesn't make a lot of sense:

    If you really bend your head around it, you could maybe discover that it's trying to say that the burst size increased from 4 to 8, but it's far more likely that the writer didn't understand what he was talking about. How do you 'double the number of bits on an IO pin' ??? Only by going from double to quad data rate. Which is definitely not the case for GDDR4.

    The explanation about latency increase is also very questionable. There's no reason why GDDR4 should have more random access patterns and I don't see how changes in addressing (assuming they are at all there) impact latency.
    It's the BL4 -> BL8 that's doing all the evil.

    Edit: small correction: addressing latency will go up by 1 cycle, but, again, that impact is minor compared to the BL increase.
     
    #643 silent_guy, Oct 8, 2006
    Last edited by a moderator: Oct 8, 2006
  4. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Basicly we can get from RGBA8 to FP64 for free!! j/k ;)
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    When you think about it, it must be quite scary (for a chip architect): your minimal memory transaction is 32 pins * 2 (DDR) * 8 (BL) = 512 bits or 64 bytes. If you organize your memory controller as 64-bit controllers, your minimal transaction is 128 bytes. (I think that ATI doesn't do the latter.)

    Say you need to change only 1 byte in such a 128 byte block, that gives you a BW efficiency of less than 1% ! So you better find ways to groups transactions as much as possible... which increases cache size and increases latency... which increases internal buffering requirements even more.

    It gets worse when you need to switch from read to write and back or switch from one row to another: there are a lot of different timing contraints that much be met and they all reduce performance.

    I'm happy I don't have to deal with that in my job. :grin:
     
    LeStoffer likes this.
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Well..it gives them a lot of good reasons to give us more-samples-per-pixel-anti-aliasing(tm)
     
  7. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Burst length is not given in clocks but bits AFAIK, so it's only half that.
     
  8. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Yes, you're right. I should have looked that up first... BTW, the datasheet can be found here.
     
  9. jpr27

    Regular

    Joined:
    Mar 7, 2004
    Messages:
    287
    Likes Received:
    0
    So in the end at this point in time with GDDR3 and GDDR4 in the coming generations, is it really worth the extra cost for GDDR4 now? I know that both cards manufacturers keep things close to the chest but hmmmm. I know ATI put alot of effort in its memory controller in the R580? ( If memory serves me correct.) Im just wondering if ATI might have some new sytems when dealing with memory BW and latencies and make the most of the burst increase to 8 that comes in the GDDR4.

    Another thought or question actually. Does anyone see XDR memory coming into play at some point? I know right now its not cost effective but just a thought? We all hear you can never have enough BW and CPU's are often considered the bottlenecks. Now that Dual and Quad cores are coming maybe we will see XDR in the R600 or G80 or close relatives?
     
  10. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    GDDR3 is really GDDR2, they just tweeked some of the signalling. GDDR4 ~= DDR3. In botth GDDR4 and DDR3 the memory devices operate in a prefetch of 8 mode internally, while GDDR2/3 and DDR2 operate in a prefetch of 4 internally. So a part that bins to say 450 Mhz GDDR2/3 will run at a pin rate of 1800 MT/s while a GDDR4 device at the same bin frequency would run at 3600 MT/s.

    Aaron spink
    speaking for myself inc.
     
    #650 aaronspink, Oct 9, 2006
    Last edited by a moderator: Oct 10, 2006
  11. trumphsiao

    Regular

    Joined:
    Jan 31, 2006
    Messages:
    285
    Likes Received:
    11
    R600 have 8 MC channels (A/B/C/D/E/F/G/H) compared to R580 have merely 4 channels of that (A/B/C/D)

    maybe someone know what I adumbrated . and dont spell out........................
     
    #651 trumphsiao, Oct 9, 2006
    Last edited by a moderator: Oct 9, 2006
  12. trumphsiao

    Regular

    Joined:
    Jan 31, 2006
    Messages:
    285
    Likes Received:
    11
    nope ,R600 would be either 32bitX8 or 64 bit X8 MC capable ASIC.
     
  13. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    R580 has 8 memory channels (4 primary ring stops with two channels per stop).
     
  14. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    512-bit & GDDR4? That's a snot-load of bw there. . . what are they planning to do with all that?

    I could imagine R600 being physically big enough for 512-bit, but I'd be woried about R680 at 65nm being big enough to support all the pins it would require. I suspect that's why NV went 384-bit --less the limitations of G80 than the limitations of the 65nm refresh.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    GDDR4 requires less pins.

    Jawed
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    It puzzled me why it wasn't 8 ring stops. I surmised that it boiled down to there being 4 PS pipelines (screen-space tiling).

    Makes me wonder whether R600 could go as far as being 8 ring stops therefore 8 pipeline, 32-1-3-1... Each ring stop would have a 32-bit channel to two 512Mbit chips for a total of 1GB.

    But ATI makes noises about going higher than 3:1 ALU:TEX, so it seems doubtful it would be 32-1-3-1. Unless that's being saved for the refresh: 32-1-4-1...

    Jawed
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Is the first number the amount of "ROPs" hypothetically?
     
  18. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    hmm if ATi goes 512 are the expecting that there will be no other bottlenecks to achieve peak performance on the r600? I would think the r600 will need 3 times more the shader, fillrate performance of the r580 before that happens, if it really is 512 bit. Thats alot of complexity and cost to add for something that might be bottlenecked in other areas.
     
  19. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Well, that's the thing isn't it? You need to figure out what they have in mind to do with it. Tho it's also possible, much like the MC itself in R520, that to *some degree* it is meant to be forward looking. Even so, I'd expect they have something figured out --either in existing features or in new features-- that gets some signficant degree of benefit from it above what gddr4+256-bit brings.

    If 512-bit is true at all, that is. I'm not discounting it nor accepting it just yet. I would feel better about it if I understood where it would bring that benefit.
     
  20. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    Wouldn't 8 stops just mean a higher latency because the max number of hops is 4 instead of 2 in the current one? I guess if the chip is a lot bigger then they could make it 8 stops.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...