Does volta no longer need masking for threads in a warp ?

Discussion in 'Architecture and Products' started by william gaatjes, Dec 29, 2017.

  1. william gaatjes

    Joined:
    May 10, 2017
    Messages:
    2
    Likes Received:
    0

    https://www.anandtech.com/show/12170/nvidia-titan-v-preview-titanomachy/2


    Hi all, i am curious about how this works. I posted this very same question in the anandtech forum but there are no replies.
    Does all the text below mean for volta that while all the 32 threads run in lockstep and some have different IF ELSE results to execute that no longer masking is needed ?

    [​IMG]

    https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    The hardware is still SIMD. The difference with Volta is that the scheduler may decide to execute instructions from the IF or ELSE side in a mixed fashion, rather than having to execute all the way down one path first before going back to the other. They would be masked as appropriate when executing.
     
  3. william gaatjes

    Joined:
    May 10, 2017
    Messages:
    2
    Likes Received:
    0
    Thank you for replying. :)
    I am confused now a bit, because how does the scheduler do that ?
    It is all simd / simt, so i get the impression that the scheduler can do this when the same instruction is present in either the IF block or ELSE block.
    Then it would be possible to mix instructions that are the same and as many threads as possible in the warp can be grouped to run.
    Or am i seeing this all wrong ?
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    A portion of the work is already done in standard SIMT branching. When lanes diverge at a conditional check, the hardware determines the valid mask for the path it chooses to execute immediately and it stores the mask and instruction pointer for the other path it will get back to.
    Prior to Volta, the stored information for the other path was left unused, but the way it can be used is the same as it is for the active path: fetch the instruction and apply the mask.

    It's simpler to only worry about one IP and mask set at a time, but in some ways it's an easier task for the scheduler. In a divergent case, it's known that there is no interaction or dependence between the paths since from their perspective the other path is masked off.

    The instruction that reaches the execution stage is different every clock, even without branching. If the hardware already handles a different instruction every clock, there's not much difference if that different instruction happens to come from the IF or ELSE path.

    The scheduler issues one instruction at a time. Per Nvidia's additional diagrams and information, the paths are treated as being separate until the different paths rejoin or execution hits explicit sync points in the code.
     
    pharma likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...