ATI Radeon X800 XT Platinum Edition / PRO Review

Discussion in 'Beyond3D Articles' started by Dave Baumann, May 4, 2004.

  1. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Re: R420 can not output 16 pixels / clock

    Quite frankly, why not? Maybe I'm exceptionally thick tonight, but are you suggesting a software / system / other non-GPU related bottleneck holding back the efficiency of the X800s? What are you referring to when talking about efficiency?

    Efficiency as I see it here is the percentage of it's peak theoretical fillrate a card achieves, and this is where the X800s fall way behind both their predecessor and their primary competition:

    Code:
    6800 Ultra    6096.7 11749.8  5313.0  2999.3  2014.5  1522.5 
    (peak)        6400.0 12800.0  6400.0  3200.0  2133.3  1600.0
    
    efficiency     95.2%   91.8%   83.0%   93.7%   94.4%   95.2%
    
    9800 XT       2840.9  2814.3  2747.4  1468.2   989.7   754.6 
    (peak)        3296.0  3296.0  3296.0  1648.0  1098.7   824.0
    
    efficiency     86.2%   85.4%   83.4%   89.1%   90.1%   91.6%
    
    X800 XT PE    5884.0  7859.3  4411.7  2467.4  1749.9  1352.6 
    (peak)        8320.0  8320.0  8320.0  4160.0  2773.3  2080.0
    
    efficiency     70.7%   94.5%   53.0%   59.3%   63.1%   65.0%
    
    X800 PRO      3182.7  5367.7  3046.3  1902.6  1122.8   884.5 
    (peak)        5700.0  5700.0  5700.0  2850.0  1900.0  1425.0
    
    efficiency     55.8%   94.1%   53.4%   66.7%   59.1%   62.1%

    Yes, clock-for-clock comparisons would be appreciated, though I'd rather go with Guest's suggestion of downclocking the XT rather than o/cing the Ultra.

    Hmmm, I have no idea and a search didn't turn up anything either.

    However, given how similar R3x0 and R420 are, and given that the reviews I've read so far seem to indicate similar performance profiles under OpenGL, I don't think this (the test possibly being GL-based) could account for the differences in efficiency shown above.

    cu

    incurable
    ________________________________________________
    If it aint in the reference design, it wont be produced. Period.
     
  2. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Re: R420 can not output 16 pixels / clock

    Ok, I'm going to take that back after reading p23 of the review.

    Apparently, R420 currently sucks at OpenGL, so that could be a possible explanation.

    cu

    incurable
    ____________
    2 or now show
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Sorry, something to note with the MDolenc Fillrate tester: IIRC it uses entire screens of the same colour, cycling through the colours. NV40 has colour buffer compression on all the time, whereas ATI doesn't (only with AA). This particular test is, more or less, the best case scenario for NV40 as its making the most of the non-AA colour compression. The same certainly isn't seen with 3DMark's fillrate tests because, quite apart from the blending issues, it uses lots of blended layers with different colours, so the colours are much more random and hence the compression can't operate as effectively here.
     
  4. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Thanks for the explanation, Dave! (and btw, great review, just finished reading it! :))

    So I guess the R360's higher fillrate efficiency in this test can be explained by its higher bandwidth per pipe per clock ratio?

    cu

    incurable
    ________________________________________________
    I wish they wished to, but even if they would, they couldn't.
     
  5. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    Re: R420 can not output 16 pixels / clock

    I'm talking about theoretically, the NV40 could be bandwidth limited with its current specs and efficiency. And theoretically, a R420 could have the same exact efficiency, but since it has the same absolute bandwidth, higher clocks aren't going to raise it's absolute fill rate significantly higher than NV40.

    No, that's not how I see it at all. That's more along the lines of "balance" as I stated earlier. We're (at least I'm) talking about bandwidth efficiency. That is, how many pixels rendered per a given amount of bandwidth.
     
  6. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    That would seem to explain it....thanks.

    On that note...it would be interesting to see the fill-rate numbers with AA enabled....
     
  7. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    Well, once again ATi has given me no reason to look at nVidia, and nVidia gives me no reason to forsake ATi at this stage. The PE looks like the ticket for me.

    Did you see the lame thing nVidia did for Tech Report just in time for the TR R420/nV40 comparison? Heh...;) I thought it was pretty funny--they shipped TR an "Extreme" version of the 6800U clocked 50MHz higher than the standard 6800U (to 450MHz), but when asked to specify when it would be available and from whom, nVidia opted out of committing itself other than to say, "Some vendors (unknown at this time) will be offering the 'Extreme' product in the near future (date presently unknown.)" (Paraphrased.) Seriously, tech sites should stop being such pushovers for the IHVs--if the company can't tell you who is going to sell it or when, or what it will cost, the web sites should say, "Sorry, but no thanks, ring us back when you have some availability and pricing info you can share." IMO, of course. This is especially true when, as TR reports, the "6800 Ultra Extreme" is noticeably hot & noisy compared with the versions nVidia has *actually announced publicly*--the 6800 and 6800U.

    As to ps3.0, sure looks like to me that R4x0 incorporates all of it--except--the 3.0 features practically guaranteed to slow them way down. IE, the R4x0 seems to incorporate the good 3.0 stuff while eschewing the slow stuff, which doesn't at all seem like a negative to me, unless one considers the deliberate support of questionable, performance-robbing features a positive. Why commit transistors in silicon, and raise issues with respect to yields, simply to support features you have demonstrated to yourself will negatively impact your per-clock performance while providing merely speculative programming advantages? As well, if the support of such features demonstrates a perfomance deficit universally regardless of hardware, then it is unlikely that those "advantages" will be supported by *competent* developers to begin with, I would certainly think. But I can see them supporting the ps3.0 features that make sense for them in terms of performance if not IQ or both, and R4x0 seems to have that aspect covered well.

    So I think far too much is being made of this, especially considering how the majority of developers have yet to demonstrate an ability to competently support ps2.0 at this time...;) I cannot see any practical negatives to what ATi has done here, with the exception of people who wish to purchase products which support *all* of ps3.0 functionality, regardless of whether that support is ever manifested in any of the 3d games they play during the lifetime of the product they buy. While there will certainly be some of those people, I see zero potential downside to ATi's approach during the lifetime of the x800 in terms of actual 3d gameplay.

    I am very impressed with the power and heat profiles of the PE versus the 6800U (and the doubtful 6800UE, as revealed by Tech Report, which nVidia was unable to define in terms of AIB partners making it or availability dates.) Like others, I am surprised to see that the fact that R4x0 is actually beginning to let us see in silicon the theoretical advantages to .13 microns and smaller processes in general, such as more transistors and greater functionality, and gpus that are clocked higher but consume less power and dissipate less heat than earlier gpu designs built on larger processes, is a fact not being perceived as a major advance in the field. I think it is, personally, and that it constitutes a major advance on the order of Intel and AMD being able to field .09 cpus which have more transistors but are clocked higher than their .13 antecedents but nevertheless run cooler and consume less power (which both of these companies have yet to do in production products.) I think that's a pretty darn respectable barrier for ATi to have crashed, myself...;)

    If yields are where they should be for R4x0, I think they'll walk away from nVidia yet again, as system OEMs like Dell are going to see everything they want in R4x0-based products, and little that they like about nV40-based reference designs, the power and heat-dissipation requirements respectively making all of the difference in their comparison of these products. Unless nVidia can show some marked progress on the process front similar to ATi's, or unless nVidia's nV40 yields prove much better than ATi's R4x0 yields, I think the game is already up...
     
  8. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    Again, I wouldn't say R360 is more "efficient". I'd say it's more "balanced." If you want to judge efficiency, downclock the R420 core such that its pixel fill rate to bandwidth ratio is equal to that of the R360.
     
  9. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83
    I'd be really surprised if NV40 has 100mhz extra to play with - Nvidia wouldn't launch with 25 percent more clock speed going to waste.
     
  10. Tic_Tac

    Newcomer

    Joined:
    Apr 10, 2004
    Messages:
    8
    Likes Received:
    0
     
  11. Scarlet

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    54
    Likes Received:
    0
    Re: R420 can not output 16 pixels / clock

    I think this is the second time in this thread alone I have this basic comment, and it is completely valueless. The assumption or inference is that there is more headroom for NV40 than R420. Total baloney.

    You can't draw any useful conclusions at all about either product on the basis of the clocks they happen to be running at today. Both companies can take a product and use process tweaks to move the part to a faster clock rate if they so desire. Or even down-shift to a finer geometry (which will have completely new clock rates).

    If anything you should pay attention to the die size (given that both are in the same geometry). The larger the die size, the more difficult it will be to reach the higher clock speeds. The difference in the die sizes between the NV40 and R420 is probably the reason one is clocked faster than the other. You can bet both companies took that into account and tuned their architecture/design appropriately. It is fallacious to assume from the results you can see in benchmarks etc. that one architecture is inherently more efficient than the other. Clock rate and architecture/design have to be considered as a gestalt.
     
  12. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Oh, I agree with you 100% Bouncing. I do. It may not - or at least in this current format or layout. But tomorrow - who knows.
     
  13. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    I am sure both Nvidia and ATI will have faster parts out in the fall.
     
  14. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
  15. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    new tech

    3dc, temperal aa and double performance of r360, whats not tempting enough? :?
     
  16. nelg

    Veteran

    Joined:
    Jan 26, 2003
    Messages:
    1,557
    Likes Received:
    42
    Location:
    Toronto
    Any word on Over-Drive :?:
     
  17. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Why weren't more sampling patterns (or alternatively, randomly but slightly modified patterns) with smaller differences between them used for temporal AA?
     
  18. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    :( These reviews are depressing. I haven't even played a directx 9 game yet and all these new cards already make mine look like crap. Anyone want to buy a low end 5900xt from me?
     
  19. Ostsol

    Veteran

    Joined:
    Nov 19, 2002
    Messages:
    1,765
    Likes Received:
    0
    Location:
    Edmonton, Alberta, Canada
    The speed is mostly from having double the pixel pipes, two more vertex processors, and a higher clock speed. I actually compared the 9700 Pro's vertex shader performance to the X800 XT's and found that it was right in line with the increase in clock speed and vertex processors.

    Temporal AA can be achieved on any R3xx video card. After all, programmable AA patterns has been something ATI's touted for quite a while.

    3DC is the only technology you've mentioned that's new. . . That's not quite enough for me to be impressed, though.

    A new anisotropic filtering implementation would have peaked my interest. FP32 without speed compromises would have brought a grin to my face. SM 3.0 would have had me drooling and cursing my empty wallet. Really, though, the only big selling point I can see right now is speed -- and that's not quite enough. It wasn't enough to get me to upgrade from my 9700 Pro to a 9800 XT, and while this speed boost is much more substantial, that's not enough encouragement for me.

    As I said before, though, I'm still satisfied with my 9700 Pro's performance. It handles every game I play quite well -- perhaps not at extreme resolutions (which my monitor can't handle anyway) and not with maxed out AA and AF, but I am still satisfied. After all, the only good reason to upgrade is when one is no longer satisfied with one's system's current performance.
     
  20. ram

    ram
    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    218
    Likes Received:
    0
    Location:
    Switzerland
    But that still doesn't explain why the throughput doesn't decrease when enabling z-writes.

    For writing 8400 MPix/s (R420 supposed peak), you only need ~33 GB/s memory bandwith for 32bit RGBA or just 17 GB/s for 16bit. The X800XT has more than that. While it might not reach the peak, but the result should be higher if it really is designed to write 16 pix/clck. For the 5900 MPix/s the R420 reaches, you would only need ~23 GB/s of raw bandwith for 32bit RGBA. The fillrate test as you describe it is also a best case scenario for the memory controller, as it can reach full efficiency using bursts. Reaching only 67% efficiency in this case is too low IMO.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...