AMD NAVI SFU (Special Function Units)?

Discussion in 'Architecture and Products' started by BRiT, Jun 4, 2019.

Tags:
  1. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,501
    Likes Received:
    8,707
    Location:
    Cleveland
    Is there any information on the special function units of Navi? I'm curious which functions are implemented. Is it merely a return of the SFUs from the old VLIW5, seems a bit silly if so considering those were hardly used and replaced with general macros. Any chance of the new SFUs being useful for RayTracing, such as intersection test or BVH building? Or would that be too much to hope for?

    Yes, lots of questions with no answers. Hopefully they will be answered in a week when AMD has their E3 Press Conference.
     
  2. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    690
    Likes Received:
    563
    Location:
    55°38′33″ N, 37°28′37″ E
    I'm betting on acceleration of Direct3D metacommands and DirectML operators.

    https://devblogs.microsoft.com/directx/directml-at-gdc-2019/
    https://developer.nvidia.com/using-ai-slide
    https://www.highperformancegraphics.org/wp-content/uploads/2018/Hot3D/HPG2018_DirectML.pdf

    About three dozen operators are defined in DirectML.DLL

    Convolution, GEMM (General Matrix Multiply), Reduce, AveragePooling, LPPooling, MAXPooling, ROIPooling, Slice, Cast, Split, Join, Padding, ValueScale2D, Upsample2D, Gather, SpaceToDepth, DepthToSpace, Tile, TopK, BatchNormalization, MVN (Mean Variance Normalization), LResponseNormalization, LPNormalization, RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), GRU (Gated Recurrent Unit)

    but GPU vendors currently implement only 4-5 of these metacommands on current hardware.

    I'd think rnn (recurrent neural network) lstm (long short-term memory) gru (gated recurrent unit) would be promising for a dedicated hardware implementation of internal state memory.
     
    #2 DmitryKo, Jul 6, 2019
    Last edited: Jul 15, 2019
  3. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    690
    Likes Received:
    563
    Location:
    55°38′33″ N, 37°28′37″ E
    Never mind, it's an alternate ALU running at 1/4 issue rate, and it's present in the GCN architecture as well.

    SIMD4/SIMD8 and SIMD16/SIMD32 refer to floating point operations processed in the same thread.

    RDNA_Graphics_Architecture_06102019.pdf


    slide 11
    Architectural advances of programmable graphics
    5th era
    Navi RDNA
    Unified Shader with scalable vector
    (SIMT ILP capable)
    4x Scalar/Vector32/SFU8
    Wave32 or Wave64

    slide 13
    Execution units
    GCN

    4x SIMD16 Vector Unit
    4x SIMD4 Special Function Unit
    1x Shared Scalar Decoder & Issue Unit
    1x Shared Vector Decoder & Issue Unit
    256 KB VGPR
    RDNA
    2x SIMD32
    2x SIMD8 Special Function Unit
    2x Scalar Decoder & Issue Unit
    2x Vector Decoder & Issue Unit
    256 KB VGPR

    slide 14
    GCN Instruction Issue
    Special Function Unit - alternate execution unit running at 1/4 rate

    slide 15
    RDNA Instruction Issue
    Vector Instruction Issue any cycle
    Or SFU issue once every 4 cycles
    Special Function Unit uses 1 issue cycle and then executes in parallel
     
    #3 DmitryKo, Jul 8, 2019
    Last edited: Jul 10, 2019
    w0lfram, Lightman, BRiT and 1 other person like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...