So is R580 supposed to be 48 pipes or 48 ALUs?

Discussion in 'Pre-release GPU Speculation' started by Bill, Nov 19, 2005.

Thread Status:
Not open for further replies.
  1. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    the wafers are processed in taipei, the chips are packaged in kaohsiung.. that alone would take three days on transport and logistics.
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The initial access to a texture is going to be slow - but the cache is there to support locality, so memory accesses will tail-off, though will presumably come in bursts as locality is exhausted.

    Jawed
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Production entails assembling an entire board?

    Production was delayed until:
    • enough R580s were batched-up ready to roll production
    • after R520 board production was ceased or partially run-down - presumably R520 board production was the priority for Christmas availability, and, erm, to get boards to those unlucky sods who ordered before Christmas but won't get them till after
    :?:

    Jawed
     
  4. AndrewM

    Newcomer

    Joined:
    May 28, 2003
    Messages:
    219
    Likes Received:
    2
    Location:
    Brisbane, QLD, Australia
    When drawing the shadows using stencil volumes (UE3, D3 etc), you aren't using any textures at all. All of the pixel shader alu's and texture units are sitting there idle. Same deal when rendering a shadow map - altho the texture units will be used when you actually apply the shadow map. Some of the fancy filtering being used in newer gen engines uses a lot of both shadow map samples AND alu's..

    Some units will remain idle while certain things are being processed. Not everything can run in parallel. This is one of the reasons behind Xenos's unified shaders. eg. when rendering shadow volumes, you dont need to do any fragment processing (all you need is simple stencil writes etc), so if you have separate VS and PS you have a bunch of units sitting idle. I'm sure you can see where I'm going with this, without explaining it further (there's plenty of other posts about this) - texture units sit idle while performing certain ops.
     
    Pete and Jawed like this.
  5. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    The difference of the relative differences may be just telling you that anything related with AF isn't the main source of differences between those two cards. As a wild guess and taking into account that the RV530 has double the depth/stencil rate than the RV515 (and the X1600XT more bandwidth than the X1300 PRO) I would say that it could be likely that the main source of difference between those two cards is precisely double rate z/stencil.

    As a test you could try to disable stencil shadow and benchmark again. Or may be try with Quake4 which seems to use less depth/stencil passes (at least in the frames I analyzed from trdemo4, I think).

    The frames I analyzed from Doom3 (first frames from trdemo2?) and Quake4 (first frames from trdemo4?) show that at 8x AF using an angle dependant algorithm similar to those of ATI and NVidia the TU bilinear rate (and I mean the ideal rate) is already quite saturated (may be 70% to 90%) and going beyond 2 ALUs per TU shouldn't have any benefit. And that taking into account that the simulator was limited to one SIMD and 1 scalar instruction per cycle, when ATI and NVidia GPUs can potentially execute much more than that with the propper optimizations (the small fragment programs used in Doom3 may limit such optimization though) increasing the preasure on the TU. In any case that may also change with the timedemo used.

    I was planning to get a X1600 to play a bit with as soon as the AGP version starts to show up here where I live ... it looks like a quite an interesting architecture (and way cheaper than the R580 when it's released) and there may be something hiding that we don't know yet there.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    RoOoBo, I'm not saying compare the performance differences, but merely the realtive performance difference between them between the two modes (i.e. X1600 is only dropping less than 2% of its performance advantage going from 1x to 16x AF). I considered Q4 but Q4 has a greater reliance on Vertex performance and this is the largest single different metric between X1300 and X1600.

    Note that this demo used is quite sprawling, nearly an entire level.
     
  7. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    It's true that the Quake4 vertex workload it's quite higher than in Doom3. So may be it's better to try Doom3 with stencil shadows disabled.

    In any case, and looking back at the numbers (the left column is X1600 and right X1300) the overhead from going to 16x AF is a bit larger in X1600 than in the X1300, but not by much (5.53% to 4.69%). And we don't know yet how much of the benefit comes from having more fragment shader units, more z/stencil rops, more vertex shader units, more bandwidth or higher clock (?). If what dominates the timedemo is stencil (or it's at least half or more of the frame rendering time) and/or the number of additional AF samples isn't that high in the shading dominated regions as the X1300 numbers seem to suggest it wouldn't still mean that the 3 additional fragment shader pipelines are being propperly utilized. That's why I think testing without shadows would be interesting as at least removes an important factor from the equation.

    I don't have the numbers right now for the ALU to texture instruction ratio in the Doom3 raw (non optimized, replaced or updated engine version) fragment programs but I remember it was something like 6 to 8 texture instructions and may be double that number of ALU instructions.

    And of course to know if there is something hidden in the RV530 would be better to use synthethic benchmarks ...

    Those results for Doom3 show that high AF doesn't represent a performance problem for the X1600, at least compared with the X1300.

    Edit: spelling, spelling, ...
     
    #447 RoOoBo, Dec 30, 2005
    Last edited by a moderator: Dec 30, 2005
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Would it be helpful to downclock X1600XT memory down to X1300Pro bandwidth?

    Jawed
     
  9. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    The less additional factors you have to consider for a fair comparison the better ...

    And it may be interesting to test the memory controller as the X1300 seems to use the old one.
     
  10. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    Dave will you post a article for X1600 before X1900:?: (suposing you will make a X1600 one), thanks.
     
  11. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    I think, that RV515 wouldn't benefit from ring-bus and that's why ati didn't redesign it. RV515 has only 4 pixel shader processors, so memory bus is probably more idle (compared to RV530) and isn't limiting X1300's performance. X1300 isn't fast enough to use FSAA, so there is no occasion to implement better memory controller.
     
    #451 no-X, Dec 30, 2005
    Last edited by a moderator: Dec 30, 2005
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Oh, come on. Any video card is fast enough to use multisampling AA, because the performance hit can be so low. This is more of a transistor count issue: ATI didn't think it was important to have FSAA in a value part, and so cut it for cost.
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    A tri-TMU a specialized unit for high degree AF? How many bilinear only games have you hit against lately or better how often do you use bilinear in games while gaming?

    In my mind it would kill the need for the obnoxious texture filtering optimizations that are turning my guts for the past years and that's by far not a waste of anything.
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Basically true, as long as you limit it to 2xMSAA for budget GPUs.
     
  15. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Doesn't matter, because in every game you play, there will be many pixels that require no anisotropy.

    Well, now that ATI has finally implemented angle-independent anisotropic, I'm seriously hoping that nVidia will follow suit.
     
  16. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Not really. ROP's aren't a significant limitation.
     
  17. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    It depends. If you mean MSAA 2x, the ring-bus wouldn't be helpfull for RV515, because the difference in performace drop between old MC and RB isn't significant in this case. If you mean MSAA 4x/6x, it's different, but the theoretical performance hit would be too big even with the ring-bus.
     
  18. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Why? Pretty much only in cases where you are rendering lots of simple pixels (shadow maps/volumes) should this ever be the case.
     
  19. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Is there any consensus that shows that the majority of pixels require bilinear only on the other hand? Better question what would do today with today's available bandwidth with say 20GTexels/s maximum theoretical bilinear fillrate?

    I'm not so sure it'll happen all that soon.
     
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I wasn't targeting ROPs by far. If there's any form of MSAA close to "free" even on high end GPUs today than it's 2x and not 4xMSAA. It'll get a lot worse on lower end GPUs due to a pile of other shortcomings, amongst them bandwidth also.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...