AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    You can't take one example and try to extrapolate from that. For example, are you certain the HD4890 is not CPU-limited at the chosen settings? 1680x1050 is not a very high resolution.
     
  2. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
  3. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Filtering is a lot cheaper than you think. It's fixed function, fixed data flow arithmetic with one operand being low precision.

    Maybe because 95%+ of textures are still 8-bit? Even when you need more range there's a host of solutions out there that don't need FP16 textures.

    Why do you say that? Just because of the ratio compared to RV770? NVidia has always emphasized perf/mm2, and they doubled 8-bit bilinear throughput from G80 to G90, despite G80 already having higher TEX:ALU than RV730.

    It's not squandered by any means. Go look at digit-life's shader tests, and then consider real-world conditions with AF enabled.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    Ha, well I've even less idea what you're referring to now.

    It's only an extra bit in address computations that are already 13-bit. I can't see anything significant there.

    Jawed
     
  5. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,336
    Likes Received:
    297
    These results are based on 8 games, not one. The performance delta between HD4890 and GTX285 in other CB tests are only 4% higher when going from 1680 to 2560, so I'm quite sure, that CPU limitation affects this result by less than 4%.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    All the evidence is in HD4770, which has the same number of TUs as HD4670 at the same clock, and radically more performance. HD4770's mixture of TUs, ALUs, RBEs and bandwidth is "perfect" in many ways.

    HD4770 appears to be bandwidth constrained, but one test I looked at showed no signs of bandwidth limitation, i.e. with bandwidth increases > core clock increases scaling was limited by core clock.

    As I hinted earlier, the fp16 throughput of HD4670 may be the justification for 32 TUs, as HD4670 with only 16 TUs (i.e. 4 clusters, 4:1 ALU:TEX), and therefore half the fp16 rate, might well have been just too miserable. I don't know.

    Jawed
     
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,933
    Likes Received:
    2,263
    Location:
    Germany
  8. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    I don't read German but here's what I see:
    HD4890 is 77% faster in Anno 1404, 100% faster in CoD5, 84% faster in Crysis Warhead, 61% faster in F.E.A.R., 25% faster in Half-life 2 (likely CPU-limited), 38% faster in Lost Planet: Colonies, 41% faster in Oblivion, and 82% faster in World in Conflict, so what's the problem? You can't expect everything to scale equally.
     
  9. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,822
    Likes Received:
    3,004
    eh I thought all textures these days were 32bit ?
     
  10. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,336
    Likes Received:
    297
    OpenGL guy: There is no problem, I appreciate your response. I'm trying to point out, that HD4890 - despite almost 3-times higher ALU power - is only 1,6-times faster at the average. Maybe slightly more, if we skip HL2 as a CPU limited game - or maybe slightly less, if we skip COD5, which could be affected by a R6xx-related driver bug (at least, at the R600 launch interview, Eric Demers mentioned, that any game, which runs slower on R600 than on R580 - even with MSAA enabled - is more likely affected by a driver issue than by a hardware limit).

    Anyway, if a 3-times (ALU) faster GPU performs about 1,6-times faster in real-world situations, it have to be caused by something. The first reason could be number of ROPs, which is the same for both R600 and RV770. (I choosed the non-MSAA results for comparision to avoid the impact of broken resolve hardware). The second reason could be different TMUs, which are more capable on R600. And I'd like to know, which feature has more impact in these games - if the better FP16 performance, or the additional point sampling units.

    I believe, that more capable TMUs and (or) higher number of ROPs could boost R7xx performance significantly. And because of that, I think R8xx will bring at least one of these changes.

    Davros: 8-bit per component? :)
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
  12. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    HD4770 has 60% more BW and twice the RBEs. To conclude that the ALUs are responsible for that performance jump is nonsense.

    If you want to see the impact of RBE and BW, look at the G92 vs G94. An 8% clock advantage (but equal BW) nearly wipes out 75% more ALUs and TUs in the 8800GT.

    If you really wanted to know how much different parts of the GPU affect game performance, I can crunch some numbers for you, but I need some help. In this TR review, list some scores at different resolutions from the 8800GT, 8800 GTS 512, and 9600GT in a table and I can do a regression (like I did before with BW) to figure out per-frame time (CPU + vertex), cycles per pixel limited by RBE or rasterizing (i.e. independent of ALU/TU count), and multiprocessor limited cycles per pixel.

    It won't tell you the impact of the TU vs ALU, but it will tell you how much frame time an 8800GT class GPU will spend limited by the RBE and give you much better perspective on RV730 vs RV740.
     
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Most textures are 8-bit per channel. Sometimes that means 32 bits per pixel, but often they're compessed to far less. The per-channel width after decompression is what's relevent to the arithmetic logic that everyone is talking about.
     
  14. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    cause like 99.9999999% of all textures are 8b per channel textures. There really isn't a need for anything higher than true color.

    8bpc is the standard 24b/32b per pixel image format used by just about everybody.

    16bpc and higher (48b/64b per pixel) is mainly used as an intermediate during HDR rending.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    HD4770 v HD4670

    http://www.xbitlabs.com/articles/video/display/call-of-juarez-2_6.html#sect0

    http://www.computerbase.de/artikel/..._grafikkarten/18/#abschnitt_performancerating

    HD4770 is 60-70% faster than HD4670, courtesy of 60% more bandwidth, 100% more fillrate and 100% more GFLOPs and despite identical texturing rate.

    Now the problem with reviews is they tend to push a budget graphics card into rendering options that no user would actually choose, i.e. maxed in-game graphics settings, optimisations turned off, with control panel settings such as transparency AA.

    Arguably the case is still open - the 2:1 ALU:TEX is a nod to the "weak" configuration budget gamers will use (less AF). But I'd like some decent evidence that it isn't simply unbalanced.

    Jawed
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,875
    Likes Received:
    767
    Location:
    London
    When did I do that? :shock:

    Jawed
     
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    When you said that HD4770's performance is all the evidence we need that RV730's TUs were squandered.

    I think that if RV740 had only 320 SPs but the same number of TUs, then it would ony lose 5-10% of performance or so with AA/AF enabled.
     
  18. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,018
    Likes Received:
    114
    I didn't want to imply it's expensive. But fact is, if you want full-rate FP16 filtering (and I didn't bring this up...) it's going to cost you some die space. To counter this, Jawed suggested to increase ALU:TEX ratio again, but it looks to me like there's some point where it doesn't really make sense to have dedicated filtering units if the ratio of ALU:TEX is high enough.
     
  19. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I don't think ALU:TEX is going to matter because doing proper filtering (including aniso and filtering weights) in a shader is too complicated and slow. The decision of whether to have full rate FP16 filtering will depend entirely on the workload. I don't think it will ever be worth it, because there's barely any need for high speed FP16 filtering since most applications look just as good with some 32bpp hackery like logluv or RGBexp.

    The only really good application I've seen of high precision filtering is VSM/ESM. It's only going to be small percentage of total texture accesses, though, and needs 32-bit filtering if not more.
     
  20. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,933
    Likes Received:
    2,263
    Location:
    Germany
    How expensive would texture addressing be on it's own?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...