Performance hit for 64bit and 128bit rendering ?

Discussion in 'General 3D Technology' started by BRiT, Sep 26, 2002.

  1. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,492
    Likes Received:
    8,695
    Location:
    Cleveland
    Does anyone know the performance hit the ATI-9700 takes for rendering in 64bit or 128bit color, in comparison to 32bit modes? Or is this something thats impossible to see/test out without DX9 or a higher version of OpenGL? Is there no option in the control-panel to "force" 64bit or 128bit rendering?

    Not that this will matter with games being a year or two away, but just curious for now.

    --|BRiT|
     
  2. Kristof

    Regular Alpha

    Joined:
    Jan 30, 2002
    Messages:
    733
    Likes Received:
    1
    Location:
    Abbots Langley
    Well you need to explain what excactly you mean with 64 bit and 128 bit rendering.

    First of all there is internal accuracy in floats. There are 4 components (RGBA) and these can be 16bit to 32bit floats. I believe that the Radeon uses a fixed format of 24 bits. So the internal accuracy always sits at 4x24=96 bits. This is not something you can turn on or off and even if you can there won't be any difference since its the whole internal structure is based on these bit counts.

    If, on the other hand, you are talking about framebuffer (external memory) bit depth then there is nothing you can force. Framebuffer only supports 16 and 32 bit formats. If you want to go to 64 or 128 bit you end up in Multiple Render Target (MRT) Case which you can not "force" on an old game since these games are designed to render to only one buffer. This is also only available in DX9 (probabaly also in OGL2.0 and maybe 1.4 using a new extension)... The only thing you could maybe force is the 10-10-10-2 external format (rather than 8-8-8-8 ), which I believe is what Matrox alows you to do.

    Does this make sense or did I misread your actual question ?

    On the other hand you do have a RGBA-32-32-32-32f format, but you can not display that AFAIK, its only to be used for render to texture and then read back in to do some funcky effects. Again this would be difficult/impossible to force...
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    The ideal would be nothing.

    Because DX9 chips can handle so many interal passes, any game that actually uses the high precision shouls also be coded to utilise the internal passes, so there will be no performance lost since, ultimately it will still be outputting to a 32bit buffer.

    However, its if the application requires more passes than the hardware can handle internally (i.e. for R300, > 16 textures, > 1024 vertex instrustions, > 160 pixel shader instructions etc) then the intermediate results should be stored in offscreen floating point buffers which is what will cause the performance drop because of the extra bandwidth needed.
     
  4. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    I pressume that your question is whether the pixel pipelines takes a dive in performance when rendering in 64/128 bit instead of 32 bit?

    Let's take the R9700. It's 8 pixel pipelines can apply 3 instructions per clock (one texture look-up, one texture address operation, and one colour operation). Notice that there is no mention of colour precision! (It not 1 colour operation per clock in 32 bit, but 1 operation per 2 clocks in 128 bit etc). The compute power to handle the higher colour precision is already in place (and one pixel is still just one pixel to the pixel pipeline).

    Okay, let's say that you are not going to fetch textures that are different (higher res/bit) than if you where using 32 bit and you're not using the framebuffer during rendering. Then the only difference should be that you get higher internal precision without a performance dive. (Remember the KyroII and 32 bit internal rendering :wink: )
     
  5. Simon Templar

    Newcomer

    Joined:
    Jun 11, 2002
    Messages:
    34
    Likes Received:
    0
    What do you make of this?

    http://www.beyond3d.com/forum/viewtopic.php?topic=2562&forum=9

    Specifically this sentence:
    "Where the NV30 differs from the 9700 is that instead of a single 128bit colour call, it can perform two 64bit operations in the same time, providing what Adam called a 'sweet spot' between performance and colour"

    I could have posted it in that thread but I think it may apply to this.

    I could be totally wrong but could you in hardware have a pipeline that could pack multiple pixels in a single pipeline instruction?

    If this is the case you could theoretically have a 128bit 4 Pipeline card producing 8 64-bit pixels? Have I gone off the deep end?

    If it scales this far does it to 32 bit pixels also? Something like 16 pixels on a 4 pipeline card?

    SIMP? Single Instruction Multiple Pixel? This approach sure would maximize hardware efficiency but is it even possible?
     
  6. sireric

    Regular

    Joined:
    Jul 26, 2002
    Messages:
    348
    Likes Received:
    22
    Location:
    Santa Clara, CA
    A general answer would be that 64b or 128b writes would take 2x or 4x more bandwidth than 32b writes. And for an application that's bandwidth limited, then you could expect a comparable performance hit.

    However, the question is, when would an application write 64b or 128b pixels? Generally, that will be for multi-pass algorithm. Given that a multi-pass algorithm will have substancial shader operations per pass, the vpu will generally not be bandwidth limited but actually be shader limited. In those cases, the 64b or 128b writes will have no performance impacts. You could imagine that with something like 3x4=12 instructions (assuming a balanced scalar, rgb vector and texture blend), you could completly hide the 128b writes.

    Later
     
  7. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    I think the interesting question is:

    Will it be able to do twice the instructions per cycle at 64 bit?

    Or will it be limited to half the instructions per cycle at 128 bit?

    What distinguishes this from a glass half-full or half-empty proposition is that we have something to compare it to in the 9700. If it doubles the 9700's shader performance (say clock for clock) when running at 64 bit, nVidia will certainly have something to market with (even if the impact on existing or near future games is negligible).
     
  8. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Hmmm yes, but then I have to ask how the NV30 compares to the R9700's ability to do the 3 instructions per clock (texture look-up, texture address operation and colour operation)? It's an unfair question of course :wink: but if they use extra transitors to handle 2 colour ops in one clock [in 64 bit] then they might have cut a corner elsewhere....

    What if the NV30 can do 2 colour operations (in 64 bit) but cannot do both a texture look-up and address operation per clock? Dunno....
     
  9. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    LeStoffer, CineFX does not make a distinction between texture address and color operations.

    Now honestly I'm not entirely sure what a texture address instruction is, but I'm guessing it's the LOD calculations and texture coordinate interpolation? Or...?
     
  10. fresh

    Newcomer

    Joined:
    Mar 5, 2002
    Messages:
    141
    Likes Received:
    0
    What about reading from a 128bit image? That would still cost performance, no matter how much you try to avoid multiple passes. Especially if you want to do bi/trilinear filtering.
     
  11. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    Are you talking about for things like cube mapping? Or do you think source art is ever going to be 128 bits?
     
  12. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    High dynamic range light maps will be floating point as will rendertargets from previous passes.
     
  13. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,749
    Likes Received:
    127
    Location:
    Taiwan
    I am not sure but I remembered that DX9 does not allow filtering for FP readings. I don't know if this has been changed. Of course, with PS 2.0 you still can do that manually.
     
  14. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,298
    Likes Received:
    137
    Location:
    On the path to wisdom
    It's a hardware limitation of both R300 and NV30. And I doubt this will change soon, considering the amount of transistors required, and the connection to the cache would have to be four times as wide. And you often want more than linear filtering for high precision data.
     
  15. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Likes Received:
    2
    Location:
    Toronto, Canada
    Are you sure about that?

    I recall reading Humus's post where he said that R300 allows that... not sure though and I could be very much mistaken here...
     
  16. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,298
    Likes Received:
    137
    Location:
    On the path to wisdom
    Yes, I am sure. The only differences between them IIRC is that R300 supports mipmaps and cube maps with FP. But no filtering.
     
  17. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Likes Received:
    2
    Location:
    Toronto, Canada
    O.k then, thx a lot for the clarification! :)

    One more thing, the following is taken from NV30 OpenGL specs (take note of the things I bolded):

    Think it's a good idea?
     
  18. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    I think Mr. Baumann have to help us out here. If his NDA is a limitation then at least give us a hint! :wink:

    Edit: Or maybe sireric could clear this one up?
     
  19. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    I believe the 64bit and 128bit formats will be used mainly for pixel aligned buffers or unfiltered textures. It would be useful for HDR textures, but doing the filtering by hand is just too messy.

    Summed area tables is an efficient way to flush all benefits of floating point textures down the toilet. Cancelation galore! It will also give a bad texture cache efficiency. And generating the textures dynamically will be rather inefficient. But at least NV30 will have the DDX/DDY instructions to make PS-filtering possible (though inefficiently).

    R300 has MIPMAPing, which is good. But I don't see any direct way to know what MIPMAP is currently used, so finding the offset to nearby texels to do bi-/trilinear by hand is difficult.
     
  20. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Think what's a good idea??

    Anyway, I'm pretty sure there is a typo in the above quote since that would rule out every FP rendertarget. I'm pretty sure atleast 2D FP texture targets are allowed.

    In any case, fixed-function filtering of floating point textures doesn't really make sense, since in most cases, these aren't going to be image data, and linear filtering is just going to be wrong.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...