RXXX Series Roadmap from AnandTech

Discussion in 'Architecture and Products' started by lopri, Aug 16, 2005.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Damn good thinking XMas, so that means:

    ROPS - ? - fragment pipelines per ROP - TMU pipelines per ROP

    Jawed
     
  2. Sunday

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    194
    Likes Received:
    6
    Location:
    GMT+1
    I find somehow nonsense to have special CrossFire card in R(V)5xx series! I mean, why would you buy nonCF card? Maybe right now you don’t want CF (‘cos of the lack of mobos, or the lack of extra $ that goes for CF capable model), but some day you’ll wish to have CF setup, and it would be very convenient to be able to use your existing card. Second DVI output shouldn’t’ be a problem with adequate adapter… In mine opinion each R(V)5xx card should be CF card, that is the only way to popularize CrossFire idea…
     
  3. phenix

    Regular

    Joined:
    Feb 22, 2003
    Messages:
    620
    Likes Received:
    1
    Location:
    Cambridge, MA
    If 128bit bus is dictated by the size of the mainstream chips are we to assume that all the future mainstream chips even RV930 will have 128 bit wide bus? Is it possible to increase the amount of data transfered per pin in future memory technologies eg. GDDR4, GDDR5 etc. to increase the effective width of the bus? BTW where the heck is QDR?
     
  4. kemosabe

    Veteran

    Joined:
    Jun 19, 2003
    Messages:
    1,001
    Likes Received:
    16
    Location:
    Montreal, Canada
    So R580 = 16 ROPS and 48 ALUs?
     
  5. kemosabe

    Veteran

    Joined:
    Jun 19, 2003
    Messages:
    1,001
    Likes Received:
    16
    Location:
    Montreal, Canada
    You forget that there are OEMs out there that buy the majority of these chips, and most of their systems don't ship with dual-GPU configurations.
     
  6. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    back to 128-bit bus
    R300 was 256-bit @0.13
    How big is 6600 compared to R300? And Rv530 ? I really doubt "chip is too small" explanation
     
  7. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Yas

    There was a talk of clock speed based increase in "pipelines" with the R420 launch, hints at "double pumped", comparisons to Intel's Netburst, and agreements concerning design tools to achieve significantly higher speed operation

    Viewing the numbers listed, the base number seems to be the first, and the mysterious number seems to be the 3rd one. Focusing on that, and assuming accuracy in the numbers, I see two sets of numbers that seem especially important: The "12 pipe" and "4-1-3-2" Wavey is making some sort of hint about, and the 16-1-1-1 and 16-1-3-1 between the R520 and R580.

    One thing that makes sense is "ROPs" and then ALUs per ROP, but I don't think that fits...that seemis too drastic a change in transistor count it would seem to me for the R520 to R580 change . That doesn't rule it out, but it doesn't seem to fit a sane refresh path.

    What does seem to fit is having a design intended to achieve that type of throughput without adding silicon, which fits with some of the indicators listed at the beginning (if they're not simply fiction). That is, by having ALU processing multi-"pumped" per clock.

    • This would maintain the "locked" pixel/ROP/ALU pipeline relationship (silicon-wise) that would seem to explain the "R3xx legacy"
    • This would correspond to some speculation I've had concerning how some of their mobile-technology solutions could be of benefit to desktop parts in terms of performance (there seems to be varying clock usage in mobile parts already, geared toward minimum power instead of maximum performance)
    • It might offer an alternative to performance scaling, depending on the profile for leakage, power, heat, etc., to execute units capable of this type of scaling on a given process

    I can't evaluate how feasible this is to be done right now, but this seems to be the type of thing that makes sense and is planned by both IHVs, with ATI already having made announcements last year that seem to directly relate to it, and there being evidence of it for nVidia for separating clocking by increasing degrees going forward.

    Using this guess does seem to indicate that the R520 would seem to be an" underperformer" without high base clock speeds, but might indicate a similarity in R580 and R520 that might directly relate to the issues reported in relation to "R520" delay depending on how this might be implemented.

    ...

    There are problems with this guess, and some remaining mysteries. What does the "2" at the end mean, and why does only the RV350 have it? DDR2? Is it the latest generation of high-clocked DDR1 for everything else? Also, why the apparently huge jump from R520 to R580? This guess does perhaps explain how it might be achievable in a refresh, but not why such a large jump in performance would be attempted. Along with this, it is significant that there is no "2" in this column between "1" and "3"...both together seem to strongly indicate that this guess is wrong, unless there is some implementation detail to explain it..

    Also, there doesn't seem to be a listing for vertex processing in the numbers. The 2nd could be "TMUs per pipe", which would fit as well for the idea of R3xx lineage, but the last remains a mystery...why would the middle range have a larger number than any other?

    Finally, why would the R420 have 16 pipes and the next generation have the same count? The R580 would certainly(!) address this if the 3rd number relates to ALU throughput somehow, but the R520 would mainly seem a fairly "dissatisfactory" stepping stone in relation. I could guess that the R420 might have already implemented something like this (making it a jump from 8 double-pumped to 16), but this wouldn't seem to fit the ROP/pixel processing relationship guesses.

    ...

    Hmm, well, the numbers could just be wrong or incomplete, but this guess doesn't seem to hold together accurately with what is known. I hope it might touch on some relevant things, though.
     
  8. phenix

    Regular

    Joined:
    Feb 22, 2003
    Messages:
    620
    Likes Received:
    1
    Location:
    Cambridge, MA

    From B3D 3D tables:

    R300: 218 mm2 (107M at 150nm process)
    NV43(Geforce6600): 150mm2 (143M at 110nm process)


    R300 is probably the smallest chip with 256bit bus. There is no die size info about the NV35 I dont know if it was a bit smaller than R300.
     
    #48 phenix, Aug 16, 2005
    Last edited by a moderator: Aug 16, 2005
  9. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    Don't forget that ATI is also revamping the shader core to some extent: at a minimum a jump from SM 2.0 and FP24 to SM 3.0 and FP32. There may pretty significant efficiency gains in general vs. the R3xx/R4xx core. In other words, given the same number of "pipelines"...clock of clock, R5xx may be significantly faster wrt shading than R3xx/R4xx.

    We'll just have to wait and see.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Seems unlikely to me, but we're still recovering from the 12-pipeline RV530, so, erm...

    Jawed
     
  11. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    Should be me I think, and gladly. A slap with the thick end of a 5800 Ultra should do it :lol:
     
  12. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,262
    Likes Received:
    22
    Location:
    Land of the 25% VAT
    It might well be wrong to assume so. For starters it takes almost twice the silicon budget to move up from FP24 to FP32 and then they still have to use silicon for non-trivial SM 3.0 features like dynamic branching. Which, I might add, ATI in the past promised to be more useful than nVidias first attemps at it. On top of this I'm pretty certain the R4xx has very high efficiency after the tweaks to the already awesome R300.
     
  13. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Here's a slightly modified table including some other numbers:
    Now, obviously, some parts could be more bandwidth limited than others. But you don't triple the number of pipelines yet only increase relative bandwidth by 29%.
    One could argue that with 29% more relative bandwidth, you could double ROP or texturing performance, but the benefits would be relatively weak in each case. You could argue that the RV515 has too much bandwidth, but that seems unlikely for a low-end part.

    Another important point is that apparently, R520 was designed in a timeframe where ATI had the ALU superiority, so they wouldn't have focused as much on it; with the R580 however, they might have realized they weren't going to have that advantage with the R520, and decided they had to fix it. This would also have affected the RV530.

    That would imply that: the first number is four times the number of "pipelines" (4xQuads); the second number is the number of dedicated texture addressing units per "pipeline"; the third number is the Vec4 ALU throughput per "pipeline"; the last number is the texture filtering throughput per "pipeline".

    Why am I differencing the addressing and the filtering? Simple question here: how many texture filtering operations are run in a single cycle nowadays, with trilinear and AF, if the IHV doesn't "bypass" the cost? Well, simply put, not that many. If correct, this would give an interesting market position for the RV530: Great image quality for the low/mid-end. Of course, it also could be the ROP throughtput, but that seems ot be a bit too bandwidth limited to me...

    Uttar
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    In Xenos the texture address calculation ALU is in the TMU pipeline - so there's a one-to-one correspondence between filtering TMUs and texture address calculation ALUs.

    I don't see how you're calculating per-clock relative bandwidth. All your numbers there seem completely screwy to me.

    Jawed
     
  15. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    This isn't Xenos. The GF6/GF7 architectures handle addressing very differently too, and their texture caches can store filtered texels. I'm not saying my speculation is correct, but disregarding it for such reasons is a bit ridiculous.

    Memory Frequency*(Bus Width/256)/Core Frequency.


    Uttar
     
    #55 Arun, Aug 16, 2005
    Last edited by a moderator: Aug 16, 2005
  16. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    Which is why I'm not assuming anything. ;)

    I'm just re-emphasizing (like you have) that there is quite a bit of difference between R5xx and R3/4xx in terms of shader capability and precision...there's going to be lots of new transistors in there to cover that. So "only" being marginally faster than the current high-end R4xx would not be all that surprising considering the new capabilities. There is a chance, though, that since the shader design had to be revamped more than "trivially", that additional efficiency gains could have been incorporated as well.
     
  17. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    WRT to the texture address processing - their past does one thng and their future does (more or less the same); its likely their present would do the same as well.

    Uttar, you are reading too much into the "designed at NV30 time", it could mean a multitude of things, for instance it could mean they have paid particular attention to FP32 register performance....
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Hmm, well you're not taking account of fragment shader pipeline count, which is, frankly, pointless.

    If you take Bandwidth/Single-texture rate (or Bandwidth/fragment rate) as the basis of your argument, I think it'll be more convincing.

    Jawed
     
  19. incurable

    Regular

    Joined:
    Apr 20, 2002
    Messages:
    547
    Likes Received:
    5
    Location:
    Germany
    Is this an established fact or just conjecture and speculation?

    (I've missed quite a few topics on this in the past few months, especially the really long threads, so a link for me to read up on this would be great. Thanks!)
     
  20. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,105
    Likes Received:
    70
    Location:
    switzerland
    that's where the hybrid vertex textures might fit in
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...