If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Member
Join Date: Jun 2007
Posts: 218
|
Here a quite fast mandelbrot viewer, making use of DX11 and the DirectCompute API.
The set is calculated with up to 1024 iterations. Making use of the horsepower of DX11 GPUs enables real-time panning and zooming even at high resolution. Two versions are included, a scalar one and a vectorized computation version. Both generate the same output. The vectorized version was made after suboptimal performance on the ATI HD 5870 with scalar calculation. Compared to the scalar version it runs twice faster on this GPU. Here we see the backside of a non scalar GPU architecture. On forthcoming scalar Nvidia DX11 GPUs, probably the scalar version will run faster. Writing a vectorized compute shader is substantial more complicated. On the HD 5870 computational throughput is well over 1.5 TFLOPS/s. Full source code is included. Remark that no drawing code was needed. It is possible to directly write to the backbuffer from the compute shader. By pressing the space bar the calculation can be switched between scalar and vectorized. Zooming in and out can be done with the left and right mouse button. Panning is with moving the mouse. With the mouse inside an invisible quarter screen sized centered circle, panning stops. With the A and Z keys the color can be cycled. ALT + Enter goes to full screen. |
|
|
|
|
|
#2 |
|
Dangerously Mirthful
Join Date: Feb 2002
Location: Highland, IN USA
Posts: 14,599
|
It's failing on start for me, an error message saying it's stopped working comes up as soon as it fires up a window.
Probably me doing something stupid, any idea what? Oh, my system: Phenom II X4 965 BE, Gigabyte MA790FXT-UDP5, 2x2GB Corsair XMS3, 1Gb ATi Radeon 5870, Corsair 620hx, 750GB HD Caviar Black
__________________
Elite Bastards - Adminish "All you know is when I'm with you, I make you free; and swim through your veins like a fish in the sea." - Uncle Kracker |
|
|
|
|
|
#3 | |
|
Crazy coder
|
Quote:
Edit: Judging the source code it appears not. Looking at the shader I would think it would be fairly easy to make it run on DX10 cards as well. Last edited by Humus; 07-Oct-2009 at 21:18. |
|
|
|
|
|
|
#4 | |
|
Member
Join Date: Jun 2007
Posts: 218
|
Quote:
It requires the DX11 beta to be installed on Vista. |
|
|
|
|
|
|
#5 | |
|
Member
Join Date: Jun 2007
Posts: 218
|
Quote:
|
|
|
|
|
|
|
#6 |
|
Senior Member
|
Once there was a 3D fractal generator written for Cg and it was rather impressive! I really want to see such thing ported to DC or OCL.
__________________
"Releasing a game in 2010 without AA is a completely foreign concept to me. If the technique you're using makes it impossible to use AA then you're using the wrong technique." -- Humus
|
|
|
|
|
|
#7 |
|
Dangerously Mirthful
Join Date: Feb 2002
Location: Highland, IN USA
Posts: 14,599
|
Ooops, sorry. Windows 7 64-bit OEM, all legal and everything.
__________________
Elite Bastards - Adminish "All you know is when I'm with you, I make you free; and swim through your veins like a fish in the sea." - Uncle Kracker |
|
|
|
|
|
#8 |
|
Crazy coder
|
RWTexture2D is not supported, but RWBuffer is.
|
|
|
|
|
|
#9 |
|
Member
Join Date: Aug 2004
Location: Indiana
Posts: 318
|
Works fine here, Vista64 updated to DX11.
|
|
|
|
|
|
#10 |
|
Member
Join Date: Jun 2007
Posts: 218
|
I've just uploaded a slightly enhanced version.
The starting positions are calculated with doubles now. This results in less noise when zooming in. Full double floating point does not work unfortunately as the HLSL compiler has bugs preventing any serious work with doubles for the moment. I could not get the scalar version working with double start positions so there is a rendering difference now. There are some other cosmetic changes as you can notice. |
|
|
|
|
|
#11 |
|
Senior Member
|
Neat demo! Having done my Ph.D in mathematics on fractal geometry it was great to see this running so fast! My friends and I used to generate Mandelbrot pictures on our Apple IIs back in high school so things have certainly come a long way since then
I took the liberty of compiling the shaders and I extracted the main loop from the vectorized version. Here it is: Code:
04 LOOP_DX10 i0 FAIL_JUMP_ADDR(11)
05 ALU_BREAK: ADDR(470) CNT(16)
88 x: MOV R11.x, R0.z
y: MOV R11.y, R0.w
z: MOV R5.z, R0.x
w: MOV R9.w, R0.y
t: OR_INT T0.w, R2.x, R1.x
89 x: SETNE_INT ____, PV88.y, 0.0f
y: SETNE_INT ____, PV88.x, 0.0f
z: SETNE_INT ____, PV88.z, 0.0f
w: SETNE_INT ____, PV88.w, 0.0f
t: OR_INT ____, R2.y, R1.y
90 x: AND_INT ____, PV89.x, PV89.w
y: AND_INT ____, PV89.y, PV89.z
z: OR_INT T0.z, T0.w, PS89
91 z: AND_INT ____, PV90.y, PV90.x
92 z: AND_INT R1.z, PV91.z, T0.z
93 x: PREDNE_INT ____, R1.z, 0.0f UPDATE_EXEC_MASK UPDATE_PRED
06 ALU: ADDR(486) CNT(124)
94 x: MUL_e ____, R6.y, R6.y
y: MUL_e ____, R6.x, R6.x
z: MUL_e ____, R6.w, R6.w
w: MUL_e ____, R6.z, R6.z
t: MUL_e T0.y, R4.x, R6.x
95 x: MULADD_e ____, R4.y, R4.y, -PV94.x
y: MULADD_e ____, R4.x, R4.x, -PV94.y
z: MULADD_e ____, R4.w, R4.w, -PV94.z
w: MULADD_e ____, R4.z, R4.z, -PV94.w
t: MUL_e T0.x, R4.y, R6.y
96 x: ADD T1.x, R5.y, PV95.x
y: ADD T1.y, R3.x, PV95.y
z: ADD T0.z, R5.y, PV95.z
w: ADD T0.w, R3.x, PV95.w
t: MUL_e ____, R4.z, R6.z
97 x: MULADD_e T0.x, T0.x, (0x40000000, 2.0f).x, R10.x
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: MUL_e ____, R4.w, R6.w
w: MULADD_e T1.w, PS96, (0x40000000, 2.0f).x, R10.y
98 x: MUL_e ____, PV97.x, PV97.x
y: MUL_e ____, PV97.y, PV97.y
z: MULADD_e T1.z, PV97.z, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PV97.w, PV97.w
t: MUL_e T0.y, T1.y, PV97.y
99 x: MULADD_e ____, T1.x, T1.x, -PV98.x
y: MULADD_e ____, T1.y, T1.y, -PV98.y
z: MUL_e ____, PV98.z, PV98.z
w: MULADD_e ____, T0.w, T0.w, -PV98.w
t: MUL_e T0.x, T1.x, T0.x
100 x: ADD T1.x, R5.y, PV99.x
y: ADD T1.y, R3.x, PV99.y
z: MULADD_e ____, T0.z, T0.z, -PV99.z
w: ADD T1.w, R3.x, PV99.w
t: MUL_e ____, T0.w, T1.w
101 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV100.z VEC_120
w: MULADD_e T0.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T2.w, PS100, (0x40000000, 2.0f).x, R10.y VEC_021
102 x: MUL_e ____, PV101.w, PV101.w
y: MUL_e ____, PV101.y, PV101.y
z: MULADD_e T1.z, PV101.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS101, PS101
t: MUL_e T0.y, T1.y, PV101.y
103 x: MULADD_e ____, T1.x, T1.x, -PV102.x
y: MULADD_e ____, T1.y, T1.y, -PV102.y
z: MUL_e ____, PV102.z, PV102.z
w: MULADD_e ____, T1.w, T1.w, -PV102.w
t: MUL_e T1.x, T1.x, T0.w
104 x: ADD T0.x, R5.y, PV103.x
y: ADD T1.y, R3.x, PV103.y
z: MULADD_e ____, T0.z, T0.z, -PV103.z
w: ADD T2.w, R3.x, PV103.w
t: MUL_e ____, T1.w, T2.w
105 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV104.z VEC_120
w: MULADD_e T1.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T0.w, PS104, (0x40000000, 2.0f).x, R10.y VEC_021
106 x: MUL_e ____, PV105.w, PV105.w
y: MUL_e ____, PV105.y, PV105.y
z: MULADD_e T1.z, PV105.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS105, PS105
t: MUL_e T0.y, T1.y, PV105.y
107 x: MULADD_e ____, T0.x, T0.x, -PV106.x
y: MULADD_e ____, T1.y, T1.y, -PV106.y
z: MUL_e ____, PV106.z, PV106.z
w: MULADD_e ____, T2.w, T2.w, -PV106.w
t: MUL_e T0.x, T0.x, T1.w
108 x: ADD T1.x, R5.y, PV107.x
y: ADD T1.y, R3.x, PV107.y
z: MULADD_e ____, T0.z, T0.z, -PV107.z
w: ADD T2.w, R3.x, PV107.w
t: MUL_e ____, T2.w, T0.w
109 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV108.z VEC_120
w: MULADD_e T0.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T1.w, PS108, (0x40000000, 2.0f).x, R10.y VEC_021
110 x: MUL_e ____, PV109.w, PV109.w
y: MUL_e ____, PV109.y, PV109.y
z: MULADD_e T1.z, PV109.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS109, PS109
t: MUL_e T0.y, T1.y, PV109.y
111 x: MULADD_e ____, T1.x, T1.x, -PV110.x
y: MULADD_e ____, T1.y, T1.y, -PV110.y
z: MUL_e ____, PV110.z, PV110.z
w: MULADD_e ____, T2.w, T2.w, -PV110.w
t: MUL_e T1.x, T1.x, T0.w
112 x: ADD T0.x, R5.y, PV111.x
y: ADD T1.y, R3.x, PV111.y
z: MULADD_e ____, T0.z, T0.z, -PV111.z
w: ADD T2.w, R3.x, PV111.w
t: MUL_e ____, T2.w, T1.w
113 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD R1.z, R5.y, PV112.z VEC_120
w: MULADD_e T1.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T0.w, PS112, (0x40000000, 2.0f).x, R10.y VEC_021
114 x: MUL_e ____, PV113.w, PV113.w
y: MUL_e ____, PV113.y, PV113.y
z: MULADD_e R2.z, PV113.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS113, PS113
t: MUL_e R0.y, T1.y, PV113.y
115 x: MULADD_e ____, T0.x, T0.x, -PV114.x
y: MULADD_e ____, T1.y, T1.y, -PV114.y
z: MUL_e ____, PV114.z, PV114.z
w: MULADD_e ____, T2.w, T2.w, -PV114.w
t: MUL_e R0.x, T0.x, T1.w
116 x: ADD R1.x, R5.y, PV115.x
y: ADD R1.y, R3.x, PV115.y
z: MULADD_e R0.z, R1.z, R1.z, -PV115.z
w: ADD R1.w, R3.x, PV115.w
t: MUL_e R0.w, T2.w, T0.w
07 ALU: ADDR(610) CNT(122)
117 x: MUL_e ____, R1.z, R2.z
y: MULADD_e ____, R0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, R0.z VEC_120
w: MULADD_e T0.w, R0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T2.w, R0.w, (0x40000000, 2.0f).x, R10.y VEC_021
118 x: MUL_e ____, PV117.w, PV117.w
y: MUL_e ____, PV117.y, PV117.y
z: MULADD_e T1.z, PV117.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS117, PS117
t: MUL_e T1.y, R1.y, PV117.y
119 x: MULADD_e ____, R1.x, R1.x, -PV118.x
y: MULADD_e ____, R1.y, R1.y, -PV118.y
z: MUL_e ____, PV118.z, PV118.z
w: MULADD_e ____, R1.w, R1.w, -PV118.w
t: MUL_e T0.x, R1.x, T0.w
120 x: ADD T1.x, R5.y, PV119.x
y: ADD T0.y, R3.x, PV119.y
z: MULADD_e ____, T0.z, T0.z, -PV119.z
w: ADD T0.w, R3.x, PV119.w
t: MUL_e ____, R1.w, T2.w
121 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T1.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV120.z VEC_120
w: MULADD_e T2.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T1.w, PS120, (0x40000000, 2.0f).x, R10.y VEC_021
122 x: MUL_e ____, PV121.w, PV121.w
y: MUL_e ____, PV121.y, PV121.y
z: MULADD_e T1.z, PV121.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS121, PS121
t: MUL_e T1.y, T0.y, PV121.y
123 x: MULADD_e ____, T1.x, T1.x, -PV122.x
y: MULADD_e ____, T0.y, T0.y, -PV122.y
z: MUL_e ____, PV122.z, PV122.z
w: MULADD_e ____, T0.w, T0.w, -PV122.w
t: MUL_e T1.x, T1.x, T2.w
124 x: ADD T0.x, R5.y, PV123.x
y: ADD T0.y, R3.x, PV123.y
z: MULADD_e ____, T0.z, T0.z, -PV123.z
w: ADD T1.w, R3.x, PV123.w
t: MUL_e ____, T0.w, T1.w
125 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T1.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV124.z VEC_120
w: MULADD_e T0.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T2.w, PS124, (0x40000000, 2.0f).x, R10.y VEC_021
126 x: MUL_e ____, PV125.w, PV125.w
y: MUL_e ____, PV125.y, PV125.y
z: MULADD_e T1.z, PV125.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS125, PS125
t: MUL_e T1.y, T0.y, PV125.y
127 x: MULADD_e ____, T0.x, T0.x, -PV126.x
y: MULADD_e ____, T0.y, T0.y, -PV126.y
z: MUL_e ____, PV126.z, PV126.z
w: MULADD_e ____, T1.w, T1.w, -PV126.w
t: MUL_e T0.x, T0.x, T0.w
128 x: ADD T1.x, R5.y, PV127.x
y: ADD T0.y, R3.x, PV127.y
z: MULADD_e ____, T0.z, T0.z, -PV127.z
w: ADD T2.w, R3.x, PV127.w
t: MUL_e ____, T1.w, T2.w
129 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T1.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV128.z VEC_120
w: MULADD_e T1.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T0.w, PS128, (0x40000000, 2.0f).x, R10.y VEC_021
130 x: MUL_e ____, PV129.w, PV129.w
y: MUL_e ____, PV129.y, PV129.y
z: MULADD_e T1.z, PV129.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS129, PS129
t: MUL_e T1.y, T0.y, PV129.y
131 x: MULADD_e ____, T1.x, T1.x, -PV130.x
y: MULADD_e ____, T0.y, T0.y, -PV130.y
z: MUL_e ____, PV130.z, PV130.z
w: MULADD_e ____, T2.w, T2.w, -PV130.w
t: MUL_e T1.x, T1.x, T1.w
132 x: ADD T0.x, R5.y, PV131.x
y: ADD T0.y, R3.x, PV131.y
z: MULADD_e ____, T0.z, T0.z, -PV131.z
w: ADD T2.w, R3.x, PV131.w
t: MUL_e ____, T2.w, T0.w
133 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T1.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV132.z VEC_120
w: MULADD_e T0.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T1.w, PS132, (0x40000000, 2.0f).x, R10.y VEC_021
134 x: MUL_e ____, PV133.w, PV133.w
y: MUL_e ____, PV133.y, PV133.y
z: MULADD_e T1.z, PV133.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS133, PS133
t: MUL_e T1.y, T0.y, PV133.y
135 x: MULADD_e ____, T0.x, T0.x, -PV134.x
y: MULADD_e ____, T0.y, T0.y, -PV134.y
z: MUL_e ____, PV134.z, PV134.z
w: MULADD_e ____, T2.w, T2.w, -PV134.w
t: MUL_e T0.x, T0.x, T0.w
136 x: ADD R0.x, R5.y, PV135.x
y: ADD R1.y, R3.x, PV135.y
z: MULADD_e ____, T0.z, T0.z, -PV135.z
w: ADD R2.w, R3.x, PV135.w
t: MUL_e ____, T2.w, T1.w
137 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T1.y, (0x40000000, 2.0f).x, R10.x
z: ADD R0.z, R5.y, PV136.z VEC_120
w: MULADD_e R0.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e R3.w, PS136, (0x40000000, 2.0f).x, R10.y VEC_021
138 x: MUL_e R1.x, PV137.w, PV137.w
y: MUL_e R0.y, PV137.y, PV137.y
z: MULADD_e R1.z, PV137.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e R1.w, PS137, PS137
t: MUL_e R2.y, R1.y, PV137.y
08 ALU: ADDR(732) CNT(122)
139 x: MULADD_e ____, R0.x, R0.x, -R1.x VEC_021
y: MULADD_e ____, R1.y, R1.y, -R0.y
z: MUL_e ____, R1.z, R1.z
w: MULADD_e ____, R2.w, R2.w, -R1.w
t: MUL_e T0.x, R0.x, R0.w
140 x: ADD T1.x, R5.y, PV139.x
y: ADD T1.y, R3.x, PV139.y
z: MULADD_e ____, R0.z, R0.z, -PV139.z
w: ADD T2.w, R3.x, PV139.w
t: MUL_e ____, R2.w, R3.w
141 x: MUL_e ____, R0.z, R1.z
y: MULADD_e ____, R2.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV140.z VEC_120
w: MULADD_e T1.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T0.w, PS140, (0x40000000, 2.0f).x, R10.y VEC_021
142 x: MUL_e ____, PV141.w, PV141.w
y: MUL_e ____, PV141.y, PV141.y
z: MULADD_e T1.z, PV141.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS141, PS141
t: MUL_e T0.y, T1.y, PV141.y
143 x: MULADD_e ____, T1.x, T1.x, -PV142.x
y: MULADD_e ____, T1.y, T1.y, -PV142.y
z: MUL_e ____, PV142.z, PV142.z
w: MULADD_e ____, T2.w, T2.w, -PV142.w
t: MUL_e T1.x, T1.x, T1.w
144 x: ADD T0.x, R5.y, PV143.x
y: ADD T1.y, R3.x, PV143.y
z: MULADD_e ____, T0.z, T0.z, -PV143.z
w: ADD T2.w, R3.x, PV143.w
t: MUL_e ____, T2.w, T0.w
145 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV144.z VEC_120
w: MULADD_e T0.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T1.w, PS144, (0x40000000, 2.0f).x, R10.y VEC_021
146 x: MUL_e ____, PV145.w, PV145.w
y: MUL_e ____, PV145.y, PV145.y
z: MULADD_e T1.z, PV145.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS145, PS145
t: MUL_e T0.y, T1.y, PV145.y
147 x: MULADD_e ____, T0.x, T0.x, -PV146.x
y: MULADD_e ____, T1.y, T1.y, -PV146.y
z: MUL_e ____, PV146.z, PV146.z
w: MULADD_e ____, T2.w, T2.w, -PV146.w
t: MUL_e T0.x, T0.x, T0.w
148 x: ADD T1.x, R5.y, PV147.x
y: ADD T1.y, R3.x, PV147.y
z: MULADD_e ____, T0.z, T0.z, -PV147.z
w: ADD T2.w, R3.x, PV147.w
t: MUL_e ____, T2.w, T1.w
149 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV148.z VEC_120
w: MULADD_e T1.w, T0.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T0.w, PS148, (0x40000000, 2.0f).x, R10.y VEC_021
150 x: MUL_e ____, PV149.w, PV149.w
y: MUL_e ____, PV149.y, PV149.y
z: MULADD_e T1.z, PV149.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS149, PS149
t: MUL_e T0.y, T1.y, PV149.y
151 x: MULADD_e ____, T1.x, T1.x, -PV150.x
y: MULADD_e ____, T1.y, T1.y, -PV150.y
z: MUL_e ____, PV150.z, PV150.z
w: MULADD_e ____, T2.w, T2.w, -PV150.w
t: MUL_e T1.x, T1.x, T1.w
152 x: ADD T0.x, R5.y, PV151.x
y: ADD T1.y, R3.x, PV151.y
z: MULADD_e ____, T0.z, T0.z, -PV151.z
w: ADD T2.w, R3.x, PV151.w
t: MUL_e ____, T2.w, T0.w
153 x: MUL_e ____, T0.z, T1.z
y: MULADD_e ____, T0.y, (0x40000000, 2.0f).x, R10.x
z: ADD T0.z, R5.y, PV152.z VEC_120
w: MULADD_e T0.w, T1.x, (0x40000000, 2.0f).x, R10.x
t: MULADD_e T1.w, PS152, (0x40000000, 2.0f).x, R10.y VEC_021
154 x: MUL_e ____, PV153.w, PV153.w
y: MUL_e ____, PV153.y, PV153.y
z: MULADD_e T1.z, PV153.x, (0x40000000, 2.0f).x, R10.y VEC_021
w: MUL_e ____, PS153, PS153
t: MUL_e T0.y, T1.y, PV153.y
155 x: MULADD_e ____, T0.x, T0.x, -PV154.x
y: MULADD_e ____, T1.y, T1.y, -PV154.y
z: MUL_e ____, PV154.z, PV154.z
w: MULADD_e ____, T2.w, T2.w, -PV154.w
t: MUL_e T0.x, T0.x, T0.w
156 x: ADD R4.x, R3.x, PV155.y
y: ADD R4.y, R5.y, PV155.x
z: MULADD_e ____, T0.z, T0.z, -PV155.z
w: MUL_e ____, T2.w, T1.w
t: ADD R4.z, R3.x, PV155.w
157 x: MULADD_e R6.x, T0.y, (0x40000000, 2.0f).x, R10.x
y: MULADD_e R6.y, T0.x, (0x40000000, 2.0f).x, R10.x
z: MUL_e ____, T0.z, T1.z
w: ADD R4.w, R5.y, PV156.z VEC_120
t: MULADD_e R6.z, PV156.w, (0x40000000, 2.0f).x, R10.y VEC_021
158 x: MUL_e ____, PV157.y, PV157.y
y: MUL_e ____, PV157.x, PV157.x
z: MUL_e ____, PS157, PS157
w: MULADD_e R6.w, PV157.z, (0x40000000, 2.0f).x, R10.y
159 x: MULADD_e T0.x, R4.y, R4.y, PV158.x
y: MULADD_e ____, R4.x, R4.x, PV158.y
z: MUL_e ____, PV158.w, PV158.w
w: MULADD_e T1.w, R4.z, R4.z, PV158.z
160 x: SETGT_DX10 R2.x, (0x40800000, 4.0f).x, PV159.y
z: MULADD_e ____, R4.w, R4.w, PV159.z
161 x: CNDE_INT R8.x, PV160.x, R8.x, R4.x
y: SETGT_DX10 R1.y, (0x40800000, 4.0f).x, PV160.z
z: SETGT_DX10 R1.z, (0x40800000, 4.0f).x, T0.x VEC_102
w: SETGT_DX10 R1.w, (0x40800000, 4.0f).x, T1.w
t: AND_INT R2.y, PV160.x, (0xFFFFFFF0, -1.#QNANf).y
09 ALU: ADDR(854) CNT(18)
162 x: CNDE_INT R7.x, R2.x, R7.x, R6.x
y: CNDE_INT R8.y, R1.z, R8.y, R4.y VEC_201
z: CNDE_INT R8.z, R1.w, R8.z, R4.z VEC_201
w: CNDE_INT R8.w, R1.y, R8.w, R4.w VEC_201
t: AND_INT T0.x, R1.z, (0xFFFFFFF0, -1.#QNANf).x
163 x: AND_INT ____, R1.w, (0xFFFFFFF0, -1.#QNANf).x VEC_201
y: CNDE_INT R7.y, R1.z, R7.y, R6.y VEC_201
z: CNDE_INT R7.z, R1.w, R7.z, R6.z VEC_201
w: CNDE_INT R7.w, R1.y, R7.w, R6.w VEC_201
t: AND_INT ____, R1.y, (0xFFFFFFF0, -1.#QNANf).x
164 x: ADD_INT R0.x, R5.z, PV163.x
y: ADD_INT R0.y, R9.w, PS163
z: ADD_INT R0.z, R11.x, R2.y
w: ADD_INT R0.w, R11.y, T0.x
165 x: MOV R1.x, R1.w
y: MOV R2.y, R1.z
10 ENDLOOP i0 PASS_JUMP_ADDR(5)
BTW, Julia Sets are also interesting to look at and can be generated in a similar manner as the Mandelbrot Set. The difference is that instead of starting with z=0 and setting z -> z^2 + C where C is the starting value for a given point, you instead fix C to be the value you wish to compute the Julia Set for and vary the starting value of z based on which pixel you are computing. For the simplest example, J_c where C = 0 is the unit circle. (The Julia Set doesn't include the interior.) This is easy to see as for all z with |z| < 1, then z -> z^2 + 0 will tend to the Origin and for all z with |z| > 1, then these will tend to infinity. J_i, a dendrite, is a neat one to look at if you get a chance to try it out.
__________________
I speak only for myself. |
|
|
|
|
|
#12 |
|
Member
Join Date: Jun 2007
Posts: 218
|
Interesting to see that so many slots get used, I thought all would end up in xyzw.
BTW with what tool did you get the assembly as ShaderAnalyser doesn't seem to be able to do this yet ? The Julia sets are interesting indeed. I've just uploaded an extended version that now also can do Julia calculations, thanks for the suggestion. More explanation at the updated link. |
|
|
|
|
|
#13 |
|
Senior Member
|
It'd be interesting to see the code generated by AMD's compiler for the scalar version. I'd like to know how good it is wrt extracting ILP from predominantly scalar code.
|
|
|
|
|
|
#14 |
|
Member
Join Date: Jun 2007
Posts: 218
|
Having looked a second time at the assembly output, I noticed quite a few loose MUL and ADD instructions, where in fact this could be done with a single MULADD instruction.
The reason appears to be that t = u*u - v*v + a is compiled in the written order. This results in a MUL MULADD and ADD sequence. Reordering the instructions to t = u*u + a - v*v gives a 20% speedup. So I assume now this gets compiled into MULADD, MULADD, so one instruction less. Computational throughput is now well over 1.7 TFLOP/s ! I've updated the viewer to version 1.4. The scalar version is now about 3 times slower than the vectorized version. This can be seen when zooming in to a full black view and in windowed mode to see the frames per second. |
|
|
|
|
|
#15 | ||
|
Senior Member
|
Quote:
Quote:
__________________
I speak only for myself. |
||
|
|
|
|
|
#16 |
|
Member
Join Date: Jun 2007
Posts: 218
|
After reading the document about the R700 instruction set I got another idea to speed up the calculation.
There is an instruction MULADD_IEE_M2 which does dst = (src0*scr1 + scr2)*2. This could be used for calculating v = 2*u*v + b after rewriting it as v = 2*(u*v + b') with b'=b/2 However this did seem to have no effect, otherwise it would save another MUL instruction per iteration. |
|
|
|
|
|
#17 |
|
Member
Join Date: Jun 2007
Posts: 218
|
Here another incremental improvement.
The shader code looks a lot cleaner now, with scalar and vector versions almost identical. The main loop is unrolled twice more and computational throughput is now over 1.9 TFLOP/s. At 2560x1600 resolution frame rate never drops below 60 fps. So it can be useful to synchronize to vsync (pressing V key). Alternatively the maximum number of iterations can be doubled to 2048 with the M key. The scalar version unfortunately now runs with rendering artifacts, I have no idea what could be the cause. I've also included the optimization mentioned in the previous post in case the AMD compiler can make use of it. This potentially could give another 25% speed bump. Edit: Fixed bug that caused 32 iterations too much for non escaping points Last edited by Voxilla; 10-Oct-2009 at 13:01. |
|
|
|
|
|
#18 |
|
Just wondering
Join Date: May 2002
Location: Germany
Posts: 1,682
|
Sounds great! Any chance you could do without unordered access view? That'd open up a wider target audience
For example me, sitting here stuck with an GTX280 in "the box" and an HD 5870 waiting in it's shipping box to be installed as soon as an unfortunately rather lengthy job (days, maybe another week) running on the rig has finished.
__________________
English is not my native tongue. Before being too nitpicky about my choice of words please consider the possiblity that I did not mean to say what you might have read into them and inquire before flaming.
|
|
|
|
|
|
#19 | |
|
Member
Join Date: Jun 2007
Posts: 218
|
Quote:
In principle most of this should be possible without compute shaders using pixel shaders, but than it get's complicated, certainly the vectorized version as this one outputs 4 pixels at once. Also now I'm using doubles to some extent, only possible with shader version 5. As soon as the HLSL compiler doesn't crash anymore I plan to have a full doubles version for allowing deeper zoom in, this probably will take till the next release of the DX SDK. PS Maybe someone of Microsoft is reading this, I willing to donate this as a sample for the next SDK Last edited by Voxilla; 10-Oct-2009 at 18:19. |
|
|
|
|
|
|
#20 |
|
Just wondering
Join Date: May 2002
Location: Germany
Posts: 1,682
|
Unless of course, someone comes up with a very clever solution to this.
__________________
English is not my native tongue. Before being too nitpicky about my choice of words please consider the possiblity that I did not mean to say what you might have read into them and inquire before flaming.
|
|
|
|
|
|
#21 | |
|
Crazy coder
|
Quote:
|
|
|
|
|
|
|
#22 |
|
Senior Member
Join Date: Sep 2003
Posts: 1,790
|
This sounds awesome, I hope you can get it to work on DX10/10.1
Good to see the ATI compiler making good use of resources Reading the link to the Keenan Julia set, he says that was a raytraced output, is this too?
__________________
However, the above is the heart of the foreskin capacitance |
|
|
|
|
|
#23 | |
|
Member
Join Date: Jun 2007
Posts: 218
|
I'm willing to get it work on DX10, give me some time.
Quote:
As the object is a fractal the ray tracing algorithm is very different from ray tracing polygon objects though. |
|
|
|
|
|
|
#24 |
|
Junior Member
Join Date: Nov 2008
Posts: 78
|
What if you change a bit the boolean math?
Like, treating inside as int or even float instead of bool? Maybe something like, instead of: while ( any(inside && counter!=0)); try while ( max4(inside * counter) != 0.0f); Just curious, it looks there is too much ALUs for it... |
|
|
|
|
|
#25 |
|
Junior Member
Join Date: Nov 2008
Posts: 78
|
Can't edit?
Didn't saw the obvious... while ( dot(inside, counter) != 0.0f); |
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
| Display Modes | |
|
|