AMD NAVI SFU (Special Function Units)?

BRiT

(>• •)>⌐■-■ (⌐■-■)
Moderator
Legend
Supporter
Is there any information on the special function units of Navi? I'm curious which functions are implemented. Is it merely a return of the SFUs from the old VLIW5, seems a bit silly if so considering those were hardly used and replaced with general macros. Any chance of the new SFUs being useful for RayTracing, such as intersection test or BVH building? Or would that be too much to hope for?

Yes, lots of questions with no answers. Hopefully they will be answered in a week when AMD has their E3 Press Conference.
 
I'm betting on acceleration of Direct3D metacommands and DirectML operators.

https://devblogs.microsoft.com/directx/directml-at-gdc-2019/
https://developer.nvidia.com/using-ai-slide
https://www.highperformancegraphics.org/wp-content/uploads/2018/Hot3D/HPG2018_DirectML.pdf

About three dozen operators are defined in DirectML.DLL

Convolution, GEMM (General Matrix Multiply), Reduce, AveragePooling, LPPooling, MAXPooling, ROIPooling, Slice, Cast, Split, Join, Padding, ValueScale2D, Upsample2D, Gather, SpaceToDepth, DepthToSpace, Tile, TopK, BatchNormalization, MVN (Mean Variance Normalization), LResponseNormalization, LPNormalization, RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), GRU (Gated Recurrent Unit)

but GPU vendors currently implement only 4-5 of these metacommands on current hardware.

I'd think rnn (recurrent neural network) lstm (long short-term memory) gru (gated recurrent unit) would be promising for a dedicated hardware implementation of internal state memory.
 
Last edited:
Never mind, it's an alternate ALU running at 1/4 issue rate, and it's present in the GCN architecture as well.

SIMD4/SIMD8 and SIMD16/SIMD32 refer to floating point operations processed in the same thread.

RDNA_Graphics_Architecture_06102019.pdf


slide 11
Architectural advances of programmable graphics
5th era
Navi RDNA
Unified Shader with scalable vector
(SIMT ILP capable)
4x Scalar/Vector32/SFU8
Wave32 or Wave64

slide 13
Execution units
GCN

4x SIMD16 Vector Unit
4x SIMD4 Special Function Unit
1x Shared Scalar Decoder & Issue Unit
1x Shared Vector Decoder & Issue Unit
256 KB VGPR
RDNA
2x SIMD32
2x SIMD8 Special Function Unit
2x Scalar Decoder & Issue Unit
2x Vector Decoder & Issue Unit
256 KB VGPR

slide 14
GCN Instruction Issue
Special Function Unit - alternate execution unit running at 1/4 rate

slide 15
RDNA Instruction Issue
Vector Instruction Issue any cycle
Or SFU issue once every 4 cycles
Special Function Unit uses 1 issue cycle and then executes in parallel
 
Last edited:
Back
Top