DX11 direct compute mandelbrot viewer

Voxilla

Regular
Here a quite fast mandelbrot viewer, making use of DX11 and the DirectCompute API.

The set is calculated with up to 1024 iterations. Making use of the horsepower of DX11 GPUs enables real-time panning and zooming even at high resolution.

Two versions are included, a scalar one and a vectorized computation version.
Both generate the same output. The vectorized version was made after suboptimal performance on the ATI HD 5870 with scalar calculation.
Compared to the scalar version it runs twice faster on this GPU. Here we see the backside of a non scalar GPU architecture.
On forthcoming scalar Nvidia DX11 GPUs, probably the scalar version will run faster. Writing a vectorized compute shader is substantial more complicated.

On the HD 5870 computational throughput is well over 1.5 TFLOPS/s.

Full source code is included.
Remark that no drawing code was needed. It is possible to directly write to the backbuffer from the compute shader.

By pressing the space bar the calculation can be switched between scalar and vectorized.
Zooming in and out can be done with the left and right mouse button.
Panning is with moving the mouse. With the mouse inside an invisible quarter screen sized centered circle, panning stops.
With the A and Z keys the color can be cycled.
ALT + Enter goes to full screen.
 
It's failing on start for me, an error message saying it's stopped working comes up as soon as it fires up a window.

Probably me doing something stupid, any idea what?

Oh, my system: Phenom II X4 965 BE, Gigabyte MA790FXT-UDP5, 2x2GB Corsair XMS3, 1Gb ATi Radeon 5870, Corsair 620hx, 750GB HD Caviar Black
 
It's failing on start for me, an error message saying it's stopped working comes up as soon as it fires up a window.

Probably me doing something stupid, any idea what?

Can you run the DX11 demos from the DX SDK ?
It requires the DX11 beta to be installed on Vista.
 
Doesn't run on DX10 cards I presume?

Edit: Judging the source code it appears not. Looking at the shader I would think it would be fairly easy to make it run on DX10 cards as well.

It's making use of the unordered access view, not sure if that is supported with DX10 feature. For you it might be easy to get it running with DX10 :)
 
Once there was a 3D fractal generator written for Cg and it was rather impressive! I really want to see such thing ported to DC or OCL.
 
I've just uploaded a slightly enhanced version.

The starting positions are calculated with doubles now. This results in less noise when zooming in.
Full double floating point does not work unfortunately as the HLSL compiler has bugs preventing any serious work with doubles for the moment.
I could not get the scalar version working with double start positions so there is a rendering difference now.
There are some other cosmetic changes as you can notice.
 
Neat demo! Having done my Ph.D in mathematics on fractal geometry it was great to see this running so fast! My friends and I used to generate Mandelbrot pictures on our Apple IIs back in high school so things have certainly come a long way since then :)

I took the liberty of compiling the shaders and I extracted the main loop from the vectorized version. Here it is:
Code:
04 LOOP_DX10 i0 FAIL_JUMP_ADDR(11) 
    05 ALU_BREAK: ADDR(470) CNT(16) 
         88  x: MOV         R11.x,  R0.z      
             y: MOV         R11.y,  R0.w      
             z: MOV         R5.z,  R0.x      
             w: MOV         R9.w,  R0.y      
             t: OR_INT      T0.w,  R2.x,  R1.x      
         89  x: SETNE_INT   ____,  PV88.y,  0.0f      
             y: SETNE_INT   ____,  PV88.x,  0.0f      
             z: SETNE_INT   ____,  PV88.z,  0.0f      
             w: SETNE_INT   ____,  PV88.w,  0.0f      
             t: OR_INT      ____,  R2.y,  R1.y      
         90  x: AND_INT     ____,  PV89.x,  PV89.w      
             y: AND_INT     ____,  PV89.y,  PV89.z      
             z: OR_INT      T0.z,  T0.w,  PS89      
         91  z: AND_INT     ____,  PV90.y,  PV90.x      
         92  z: AND_INT     R1.z,  PV91.z,  T0.z      
         93  x: PREDNE_INT  ____,  R1.z,  0.0f      UPDATE_EXEC_MASK UPDATE_PRED 
    06 ALU: ADDR(486) CNT(124) 
         94  x: MUL_e       ____,  R6.y,  R6.y      
             y: MUL_e       ____,  R6.x,  R6.x      
             z: MUL_e       ____,  R6.w,  R6.w      
             w: MUL_e       ____,  R6.z,  R6.z      
             t: MUL_e       T0.y,  R4.x,  R6.x      
         95  x: MULADD_e    ____,  R4.y,  R4.y, -PV94.x      
             y: MULADD_e    ____,  R4.x,  R4.x, -PV94.y      
             z: MULADD_e    ____,  R4.w,  R4.w, -PV94.z      
             w: MULADD_e    ____,  R4.z,  R4.z, -PV94.w      
             t: MUL_e       T0.x,  R4.y,  R6.y      
         96  x: ADD         T1.x,  R5.y,  PV95.x      
             y: ADD         T1.y,  R3.x,  PV95.y      
             z: ADD         T0.z,  R5.y,  PV95.z      
             w: ADD         T0.w,  R3.x,  PV95.w      
             t: MUL_e       ____,  R4.z,  R6.z      
         97  x: MULADD_e    T0.x,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: MUL_e       ____,  R4.w,  R6.w      
             w: MULADD_e    T1.w,  PS96,  (0x40000000, 2.0f).x,  R10.y      
         98  x: MUL_e       ____,  PV97.x,  PV97.x      
             y: MUL_e       ____,  PV97.y,  PV97.y      
             z: MULADD_e    T1.z,  PV97.z,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PV97.w,  PV97.w      
             t: MUL_e       T0.y,  T1.y,  PV97.y      
         99  x: MULADD_e    ____,  T1.x,  T1.x, -PV98.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV98.y      
             z: MUL_e       ____,  PV98.z,  PV98.z      
             w: MULADD_e    ____,  T0.w,  T0.w, -PV98.w      
             t: MUL_e       T0.x,  T1.x,  T0.x      
        100  x: ADD         T1.x,  R5.y,  PV99.x      
             y: ADD         T1.y,  R3.x,  PV99.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV99.z      
             w: ADD         T1.w,  R3.x,  PV99.w      
             t: MUL_e       ____,  T0.w,  T1.w      
        101  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV100.z      VEC_120 
             w: MULADD_e    T0.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T2.w,  PS100,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        102  x: MUL_e       ____,  PV101.w,  PV101.w      
             y: MUL_e       ____,  PV101.y,  PV101.y      
             z: MULADD_e    T1.z,  PV101.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS101,  PS101      
             t: MUL_e       T0.y,  T1.y,  PV101.y      
        103  x: MULADD_e    ____,  T1.x,  T1.x, -PV102.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV102.y      
             z: MUL_e       ____,  PV102.z,  PV102.z      
             w: MULADD_e    ____,  T1.w,  T1.w, -PV102.w      
             t: MUL_e       T1.x,  T1.x,  T0.w      
        104  x: ADD         T0.x,  R5.y,  PV103.x      
             y: ADD         T1.y,  R3.x,  PV103.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV103.z      
             w: ADD         T2.w,  R3.x,  PV103.w      
             t: MUL_e       ____,  T1.w,  T2.w      
        105  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV104.z      VEC_120 
             w: MULADD_e    T1.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T0.w,  PS104,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        106  x: MUL_e       ____,  PV105.w,  PV105.w      
             y: MUL_e       ____,  PV105.y,  PV105.y      
             z: MULADD_e    T1.z,  PV105.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS105,  PS105      
             t: MUL_e       T0.y,  T1.y,  PV105.y      
        107  x: MULADD_e    ____,  T0.x,  T0.x, -PV106.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV106.y      
             z: MUL_e       ____,  PV106.z,  PV106.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV106.w      
             t: MUL_e       T0.x,  T0.x,  T1.w      
        108  x: ADD         T1.x,  R5.y,  PV107.x      
             y: ADD         T1.y,  R3.x,  PV107.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV107.z      
             w: ADD         T2.w,  R3.x,  PV107.w      
             t: MUL_e       ____,  T2.w,  T0.w      
        109  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV108.z      VEC_120 
             w: MULADD_e    T0.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T1.w,  PS108,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        110  x: MUL_e       ____,  PV109.w,  PV109.w      
             y: MUL_e       ____,  PV109.y,  PV109.y      
             z: MULADD_e    T1.z,  PV109.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS109,  PS109      
             t: MUL_e       T0.y,  T1.y,  PV109.y      
        111  x: MULADD_e    ____,  T1.x,  T1.x, -PV110.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV110.y      
             z: MUL_e       ____,  PV110.z,  PV110.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV110.w      
             t: MUL_e       T1.x,  T1.x,  T0.w      
        112  x: ADD         T0.x,  R5.y,  PV111.x      
             y: ADD         T1.y,  R3.x,  PV111.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV111.z      
             w: ADD         T2.w,  R3.x,  PV111.w      
             t: MUL_e       ____,  T2.w,  T1.w      
        113  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         R1.z,  R5.y,  PV112.z      VEC_120 
             w: MULADD_e    T1.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T0.w,  PS112,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        114  x: MUL_e       ____,  PV113.w,  PV113.w      
             y: MUL_e       ____,  PV113.y,  PV113.y      
             z: MULADD_e    R2.z,  PV113.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS113,  PS113      
             t: MUL_e       R0.y,  T1.y,  PV113.y      
        115  x: MULADD_e    ____,  T0.x,  T0.x, -PV114.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV114.y      
             z: MUL_e       ____,  PV114.z,  PV114.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV114.w      
             t: MUL_e       R0.x,  T0.x,  T1.w      
        116  x: ADD         R1.x,  R5.y,  PV115.x      
             y: ADD         R1.y,  R3.x,  PV115.y      
             z: MULADD_e    R0.z,  R1.z,  R1.z, -PV115.z      
             w: ADD         R1.w,  R3.x,  PV115.w      
             t: MUL_e       R0.w,  T2.w,  T0.w      
    07 ALU: ADDR(610) CNT(122) 
        117  x: MUL_e       ____,  R1.z,  R2.z      
             y: MULADD_e    ____,  R0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  R0.z      VEC_120 
             w: MULADD_e    T0.w,  R0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T2.w,  R0.w,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        118  x: MUL_e       ____,  PV117.w,  PV117.w      
             y: MUL_e       ____,  PV117.y,  PV117.y      
             z: MULADD_e    T1.z,  PV117.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS117,  PS117      
             t: MUL_e       T1.y,  R1.y,  PV117.y      
        119  x: MULADD_e    ____,  R1.x,  R1.x, -PV118.x      
             y: MULADD_e    ____,  R1.y,  R1.y, -PV118.y      
             z: MUL_e       ____,  PV118.z,  PV118.z      
             w: MULADD_e    ____,  R1.w,  R1.w, -PV118.w      
             t: MUL_e       T0.x,  R1.x,  T0.w      
        120  x: ADD         T1.x,  R5.y,  PV119.x      
             y: ADD         T0.y,  R3.x,  PV119.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV119.z      
             w: ADD         T0.w,  R3.x,  PV119.w      
             t: MUL_e       ____,  R1.w,  T2.w      
        121  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T1.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV120.z      VEC_120 
             w: MULADD_e    T2.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T1.w,  PS120,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        122  x: MUL_e       ____,  PV121.w,  PV121.w      
             y: MUL_e       ____,  PV121.y,  PV121.y      
             z: MULADD_e    T1.z,  PV121.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS121,  PS121      
             t: MUL_e       T1.y,  T0.y,  PV121.y      
        123  x: MULADD_e    ____,  T1.x,  T1.x, -PV122.x      
             y: MULADD_e    ____,  T0.y,  T0.y, -PV122.y      
             z: MUL_e       ____,  PV122.z,  PV122.z      
             w: MULADD_e    ____,  T0.w,  T0.w, -PV122.w      
             t: MUL_e       T1.x,  T1.x,  T2.w      
        124  x: ADD         T0.x,  R5.y,  PV123.x      
             y: ADD         T0.y,  R3.x,  PV123.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV123.z      
             w: ADD         T1.w,  R3.x,  PV123.w      
             t: MUL_e       ____,  T0.w,  T1.w      
        125  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T1.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV124.z      VEC_120 
             w: MULADD_e    T0.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T2.w,  PS124,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        126  x: MUL_e       ____,  PV125.w,  PV125.w      
             y: MUL_e       ____,  PV125.y,  PV125.y      
             z: MULADD_e    T1.z,  PV125.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS125,  PS125      
             t: MUL_e       T1.y,  T0.y,  PV125.y      
        127  x: MULADD_e    ____,  T0.x,  T0.x, -PV126.x      
             y: MULADD_e    ____,  T0.y,  T0.y, -PV126.y      
             z: MUL_e       ____,  PV126.z,  PV126.z      
             w: MULADD_e    ____,  T1.w,  T1.w, -PV126.w      
             t: MUL_e       T0.x,  T0.x,  T0.w      
        128  x: ADD         T1.x,  R5.y,  PV127.x      
             y: ADD         T0.y,  R3.x,  PV127.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV127.z      
             w: ADD         T2.w,  R3.x,  PV127.w      
             t: MUL_e       ____,  T1.w,  T2.w      
        129  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T1.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV128.z      VEC_120 
             w: MULADD_e    T1.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T0.w,  PS128,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        130  x: MUL_e       ____,  PV129.w,  PV129.w      
             y: MUL_e       ____,  PV129.y,  PV129.y      
             z: MULADD_e    T1.z,  PV129.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS129,  PS129      
             t: MUL_e       T1.y,  T0.y,  PV129.y      
        131  x: MULADD_e    ____,  T1.x,  T1.x, -PV130.x      
             y: MULADD_e    ____,  T0.y,  T0.y, -PV130.y      
             z: MUL_e       ____,  PV130.z,  PV130.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV130.w      
             t: MUL_e       T1.x,  T1.x,  T1.w      
        132  x: ADD         T0.x,  R5.y,  PV131.x      
             y: ADD         T0.y,  R3.x,  PV131.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV131.z      
             w: ADD         T2.w,  R3.x,  PV131.w      
             t: MUL_e       ____,  T2.w,  T0.w      
        133  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T1.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV132.z      VEC_120 
             w: MULADD_e    T0.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T1.w,  PS132,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        134  x: MUL_e       ____,  PV133.w,  PV133.w      
             y: MUL_e       ____,  PV133.y,  PV133.y      
             z: MULADD_e    T1.z,  PV133.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS133,  PS133      
             t: MUL_e       T1.y,  T0.y,  PV133.y      
        135  x: MULADD_e    ____,  T0.x,  T0.x, -PV134.x      
             y: MULADD_e    ____,  T0.y,  T0.y, -PV134.y      
             z: MUL_e       ____,  PV134.z,  PV134.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV134.w      
             t: MUL_e       T0.x,  T0.x,  T0.w      
        136  x: ADD         R0.x,  R5.y,  PV135.x      
             y: ADD         R1.y,  R3.x,  PV135.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV135.z      
             w: ADD         R2.w,  R3.x,  PV135.w      
             t: MUL_e       ____,  T2.w,  T1.w      
        137  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T1.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         R0.z,  R5.y,  PV136.z      VEC_120 
             w: MULADD_e    R0.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    R3.w,  PS136,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        138  x: MUL_e       R1.x,  PV137.w,  PV137.w      
             y: MUL_e       R0.y,  PV137.y,  PV137.y      
             z: MULADD_e    R1.z,  PV137.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       R1.w,  PS137,  PS137      
             t: MUL_e       R2.y,  R1.y,  PV137.y      
    08 ALU: ADDR(732) CNT(122) 
        139  x: MULADD_e    ____,  R0.x,  R0.x, -R1.x      VEC_021 
             y: MULADD_e    ____,  R1.y,  R1.y, -R0.y      
             z: MUL_e       ____,  R1.z,  R1.z      
             w: MULADD_e    ____,  R2.w,  R2.w, -R1.w      
             t: MUL_e       T0.x,  R0.x,  R0.w      
        140  x: ADD         T1.x,  R5.y,  PV139.x      
             y: ADD         T1.y,  R3.x,  PV139.y      
             z: MULADD_e    ____,  R0.z,  R0.z, -PV139.z      
             w: ADD         T2.w,  R3.x,  PV139.w      
             t: MUL_e       ____,  R2.w,  R3.w      
        141  x: MUL_e       ____,  R0.z,  R1.z      
             y: MULADD_e    ____,  R2.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV140.z      VEC_120 
             w: MULADD_e    T1.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T0.w,  PS140,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        142  x: MUL_e       ____,  PV141.w,  PV141.w      
             y: MUL_e       ____,  PV141.y,  PV141.y      
             z: MULADD_e    T1.z,  PV141.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS141,  PS141      
             t: MUL_e       T0.y,  T1.y,  PV141.y      
        143  x: MULADD_e    ____,  T1.x,  T1.x, -PV142.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV142.y      
             z: MUL_e       ____,  PV142.z,  PV142.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV142.w      
             t: MUL_e       T1.x,  T1.x,  T1.w      
        144  x: ADD         T0.x,  R5.y,  PV143.x      
             y: ADD         T1.y,  R3.x,  PV143.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV143.z      
             w: ADD         T2.w,  R3.x,  PV143.w      
             t: MUL_e       ____,  T2.w,  T0.w      
        145  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV144.z      VEC_120 
             w: MULADD_e    T0.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T1.w,  PS144,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        146  x: MUL_e       ____,  PV145.w,  PV145.w      
             y: MUL_e       ____,  PV145.y,  PV145.y      
             z: MULADD_e    T1.z,  PV145.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS145,  PS145      
             t: MUL_e       T0.y,  T1.y,  PV145.y      
        147  x: MULADD_e    ____,  T0.x,  T0.x, -PV146.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV146.y      
             z: MUL_e       ____,  PV146.z,  PV146.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV146.w      
             t: MUL_e       T0.x,  T0.x,  T0.w      
        148  x: ADD         T1.x,  R5.y,  PV147.x      
             y: ADD         T1.y,  R3.x,  PV147.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV147.z      
             w: ADD         T2.w,  R3.x,  PV147.w      
             t: MUL_e       ____,  T2.w,  T1.w      
        149  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV148.z      VEC_120 
             w: MULADD_e    T1.w,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T0.w,  PS148,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        150  x: MUL_e       ____,  PV149.w,  PV149.w      
             y: MUL_e       ____,  PV149.y,  PV149.y      
             z: MULADD_e    T1.z,  PV149.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS149,  PS149      
             t: MUL_e       T0.y,  T1.y,  PV149.y      
        151  x: MULADD_e    ____,  T1.x,  T1.x, -PV150.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV150.y      
             z: MUL_e       ____,  PV150.z,  PV150.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV150.w      
             t: MUL_e       T1.x,  T1.x,  T1.w      
        152  x: ADD         T0.x,  R5.y,  PV151.x      
             y: ADD         T1.y,  R3.x,  PV151.y      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV151.z      
             w: ADD         T2.w,  R3.x,  PV151.w      
             t: MUL_e       ____,  T2.w,  T0.w      
        153  x: MUL_e       ____,  T0.z,  T1.z      
             y: MULADD_e    ____,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             z: ADD         T0.z,  R5.y,  PV152.z      VEC_120 
             w: MULADD_e    T0.w,  T1.x,  (0x40000000, 2.0f).x,  R10.x      
             t: MULADD_e    T1.w,  PS152,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        154  x: MUL_e       ____,  PV153.w,  PV153.w      
             y: MUL_e       ____,  PV153.y,  PV153.y      
             z: MULADD_e    T1.z,  PV153.x,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
             w: MUL_e       ____,  PS153,  PS153      
             t: MUL_e       T0.y,  T1.y,  PV153.y      
        155  x: MULADD_e    ____,  T0.x,  T0.x, -PV154.x      
             y: MULADD_e    ____,  T1.y,  T1.y, -PV154.y      
             z: MUL_e       ____,  PV154.z,  PV154.z      
             w: MULADD_e    ____,  T2.w,  T2.w, -PV154.w      
             t: MUL_e       T0.x,  T0.x,  T0.w      
        156  x: ADD         R4.x,  R3.x,  PV155.y      
             y: ADD         R4.y,  R5.y,  PV155.x      
             z: MULADD_e    ____,  T0.z,  T0.z, -PV155.z      
             w: MUL_e       ____,  T2.w,  T1.w      
             t: ADD         R4.z,  R3.x,  PV155.w      
        157  x: MULADD_e    R6.x,  T0.y,  (0x40000000, 2.0f).x,  R10.x      
             y: MULADD_e    R6.y,  T0.x,  (0x40000000, 2.0f).x,  R10.x      
             z: MUL_e       ____,  T0.z,  T1.z      
             w: ADD         R4.w,  R5.y,  PV156.z      VEC_120 
             t: MULADD_e    R6.z,  PV156.w,  (0x40000000, 2.0f).x,  R10.y      VEC_021 
        158  x: MUL_e       ____,  PV157.y,  PV157.y      
             y: MUL_e       ____,  PV157.x,  PV157.x      
             z: MUL_e       ____,  PS157,  PS157      
             w: MULADD_e    R6.w,  PV157.z,  (0x40000000, 2.0f).x,  R10.y      
        159  x: MULADD_e    T0.x,  R4.y,  R4.y,  PV158.x      
             y: MULADD_e    ____,  R4.x,  R4.x,  PV158.y      
             z: MUL_e       ____,  PV158.w,  PV158.w      
             w: MULADD_e    T1.w,  R4.z,  R4.z,  PV158.z      
        160  x: SETGT_DX10  R2.x,  (0x40800000, 4.0f).x,  PV159.y      
             z: MULADD_e    ____,  R4.w,  R4.w,  PV159.z      
        161  x: CNDE_INT    R8.x,  PV160.x,  R8.x,  R4.x      
             y: SETGT_DX10  R1.y,  (0x40800000, 4.0f).x,  PV160.z      
             z: SETGT_DX10  R1.z,  (0x40800000, 4.0f).x,  T0.x      VEC_102 
             w: SETGT_DX10  R1.w,  (0x40800000, 4.0f).x,  T1.w      
             t: AND_INT     R2.y,  PV160.x,  (0xFFFFFFF0, -1.#QNANf).y      
    09 ALU: ADDR(854) CNT(18) 
        162  x: CNDE_INT    R7.x,  R2.x,  R7.x,  R6.x      
             y: CNDE_INT    R8.y,  R1.z,  R8.y,  R4.y      VEC_201 
             z: CNDE_INT    R8.z,  R1.w,  R8.z,  R4.z      VEC_201 
             w: CNDE_INT    R8.w,  R1.y,  R8.w,  R4.w      VEC_201 
             t: AND_INT     T0.x,  R1.z,  (0xFFFFFFF0, -1.#QNANf).x      
        163  x: AND_INT     ____,  R1.w,  (0xFFFFFFF0, -1.#QNANf).x      VEC_201 
             y: CNDE_INT    R7.y,  R1.z,  R7.y,  R6.y      VEC_201 
             z: CNDE_INT    R7.z,  R1.w,  R7.z,  R6.z      VEC_201 
             w: CNDE_INT    R7.w,  R1.y,  R7.w,  R6.w      VEC_201 
             t: AND_INT     ____,  R1.y,  (0xFFFFFFF0, -1.#QNANf).x      
        164  x: ADD_INT     R0.x,  R5.z,  PV163.x      
             y: ADD_INT     R0.y,  R9.w,  PS163      
             z: ADD_INT     R0.z,  R11.x,  R2.y      
             w: ADD_INT     R0.w,  R11.y,  T0.x      
        165  x: MOV         R1.x,  R1.w      
             y: MOV         R2.y,  R1.z      
10 ENDLOOP i0 PASS_JUMP_ADDR(5)
As you can see, 68 of the 78 ALU instructions have all 5 slots populated, who says the 't' unit never gets used!

BTW, Julia Sets are also interesting to look at and can be generated in a similar manner as the Mandelbrot Set. The difference is that instead of starting with z=0 and setting z -> z^2 + C where C is the starting value for a given point, you instead fix C to be the value you wish to compute the Julia Set for and vary the starting value of z based on which pixel you are computing. For the simplest example, J_c where C = 0 is the unit circle. (The Julia Set doesn't include the interior.) This is easy to see as for all z with |z| < 1, then z -> z^2 + 0 will tend to the Origin and for all z with |z| > 1, then these will tend to infinity.

J_i, a dendrite, is a neat one to look at if you get a chance to try it out.
 
Interesting to see that so many slots get used, I thought all would end up in xyzw.
BTW with what tool did you get the assembly as ShaderAnalyser doesn't seem to be able to do this yet ?

The Julia sets are interesting indeed.
I've just uploaded an extended version that now also can do Julia calculations, thanks for the suggestion.
More explanation at the updated link.
 
It'd be interesting to see the code generated by AMD's compiler for the scalar version. I'd like to know how good it is wrt extracting ILP from predominantly scalar code.
 
Having looked a second time at the assembly output, I noticed quite a few loose MUL and ADD instructions, where in fact this could be done with a single MULADD instruction.

The reason appears to be that t = u*u - v*v + a is compiled in the written order.
This results in a MUL MULADD and ADD sequence.

Reordering the instructions to t = u*u + a - v*v gives a 20% speedup.
So I assume now this gets compiled into MULADD, MULADD, so one instruction less.

Computational throughput is now well over 1.7 TFLOP/s !

I've updated the viewer to version 1.4.

The scalar version is now about 3 times slower than the vectorized version.
This can be seen when zooming in to a full black view and in windowed mode to see the frames per second.
 
Interesting to see that so many slots get used, I thought all would end up in xyzw.
BTW with what tool did you get the assembly as ShaderAnalyser doesn't seem to be able to do this yet ?
Being an AMD employee, I have access to such tools ;)
Voxilla said:
Having looked a second time at the assembly output, I noticed quite a few loose MUL and ADD instructions, where in fact this could be done with a single MULADD instruction.

The reason appears to be that t = u*u - v*v + a is compiled in the written order.
This results in a MUL MULADD and ADD sequence.

Reordering the instructions to t = u*u + a - v*v gives a 20% speedup.
So I assume now this gets compiled into MULADD, MULADD, so one instruction less.

Computational throughput is now well over 1.7 TFLOP/s !
I believe that left-to-right evaluation is the default when there are multiple operators with the same precedence. Your reordering of the calculations had a nice effect on the assembly output. The inner loop is only 64 ALU slots now and is mostly MADs.
 
After reading the document about the R700 instruction set I got another idea to speed up the calculation.
There is an instruction MULADD_IEE_M2 which does dst = (src0*scr1 + scr2)*2.

This could be used for calculating

v = 2*u*v + b

after rewriting it as

v = 2*(u*v + b') with b'=b/2

However this did seem to have no effect, otherwise it would save another MUL instruction per iteration.
 
Here another incremental improvement.
The shader code looks a lot cleaner now, with scalar and vector versions almost identical.
The main loop is unrolled twice more and computational throughput is now over 1.9 TFLOP/s.

At 2560x1600 resolution frame rate never drops below 60 fps. So it can be useful to synchronize to vsync (pressing V key). Alternatively the maximum number of iterations can be doubled to 2048 with the M key.

The scalar version unfortunately now runs with rendering artifacts, I have no idea what could be the cause.
I've also included the optimization mentioned in the previous post in case the AMD compiler can make use of it.
This potentially could give another 25% speed bump.

Edit: Fixed bug that caused 32 iterations too much for non escaping points
 
Last edited by a moderator:
Sounds great! Any chance you could do without unordered access view? That'd open up a wider target audience ;)

For example me, sitting here stuck with an GTX280 in "the box" and an HD 5870 waiting in it's shipping box to be installed as soon as an unfortunately rather lengthy job (days, maybe another week) running on the rig has finished.
 
Sounds great! Any chance you could do without unordered access view? That'd open up a wider target audience ;)

I had a try at it to get it working with D3D_FEATURE_LEVEL_10_0 but this would not let me create the device with DXGI_USAGE_UNORDERED_ACCESS, and this is needed to be able to directly write to the backbuffer. Maybe there is another way to get the data there, via copying from another buffer, any suggestions ?

In principle most of this should be possible without compute shaders using pixel shaders, but than it get's complicated, certainly the vectorized version as this one outputs 4 pixels at once.

Also now I'm using doubles to some extent, only possible with shader version 5.
As soon as the HLSL compiler doesn't crash anymore I plan to have a full doubles version for allowing deeper zoom in, this probably will take till the next release of the DX SDK.

PS Maybe someone of Microsoft is reading this, I willing to donate this as a sample for the next SDK :)
 
Last edited by a moderator:
:( Ok, so I'll have to be patient until i can put my HD 5870 to work. Ah well, more stuff to look forward to! ;)

Unless of course, someone comes up with a very clever solution to this.
 
Back
Top