very informative interview on nVIDIA

Discussion in 'Architecture and Products' started by 991060, May 13, 2004.

  1. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    Hopefully it is option a, Simon.
     
  2. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    It's b. You can't hide the branch latency.
     
  3. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
  4. sonix666

    Regular

    Joined:
    Mar 31, 2003
    Messages:
    595
    Likes Received:
    3
    Depends on how branching is implemented. CPUs have branch prediction algorithms, which in most situations hide latency quite well. But I doubt that GPUs are already that advanced.
     
  5. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I felt the same way when I first found out, but it just means you have to tread carefully using them. It's probably alot easier than dealing with verture texture fetch latency!
     
  6. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    Any word on the possiblility of branch prediction in NV40's pixel pipes like sonix mentioned? I remember an Anandtech Geforce FX review which mentioned Nvidia's branch prediction ability in relation to Ati's. Here's what I'm referring to:

     
  7. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    any idea why the "endif" also takes 2 cycles? (that's just bizarre). Also was the "if add else add endif" shader running on the VS or the PS?
     
  8. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Yes this seems to imply a solution that is not at all like a conventional CPU.

    I had considered a few solutions that might show behaviour like this, they mostly boil down to the body of the if not being in contiguous memory with the surrounding code. i.e. it effectively requires 2 jumps for the if not one.
     
  9. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    ERP, well the if would be a single "not taken" branch plus an unconditional branch to get past the else body, but...

    Edit: ignore above I see what you mean now.

    Speculation:

    The 2 cycle branch penalty would appear to indicate that a branch (unconditional or not) cannot co-issue with any other instructions (1 cycle) and that the execution units are stalled for 1 cycle after the new IP value is known (for next instruction fetch).

    Maybe the endif penalty comes from a pipeline flush - i.e. making sure that any in-flight instruction has written it's results back to the register file?
     
  10. 991060

    Regular

    Joined:
    Jul 29, 2003
    Messages:
    640
    Likes Received:
    2
    Location:
    Beijing
    It was run in the pixel shader with 61.11 and DX9.0c beta2 runtime.
     
  11. lwells

    Newcomer

    Joined:
    Nov 13, 2002
    Messages:
    62
    Likes Received:
    0
    For the slow and less technically inclined amongst us, would someone explain what the big revelation is?

    LW.
     
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,903
    Likes Received:
    218
    Location:
    Seattle, WA
    While it does depend upon how the branch is implemented, branch prediction is not an absolute necessity in hiding the branch latency.

    That is, if you have some calculations before the branch that do not affect the branch, it is conceivable that the hardware could execute those to take care of branch latency. I don't know if this is possible on the NV4x, though.

    As a side note, it may be that the 2-cycle latency is the minimum latency of branching on the NV40, that there is actually more latency there that can be hidden when there are more than a couple of instrucitons. I think that may be the best explanation of the performance of the "if add else add endif" program, as one would expect that to take about 5 clocks (if (2) add (1) endif (2) or else (2) add (1) endif (2)). A longer program may get closer to the 2-cycle latency (this seems to imply 4-cycle latency).
     
  13. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,903
    Likes Received:
    218
    Location:
    Seattle, WA
    If all branches, whether conditional or not, are handled by a simple goto statement, then endif could well force a branch. That is, the "endif" signifies returning to the rest of the program.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...