Are You saying the F-buffer will/can be exposed in DX9.0c?
I usually like Tim Sweeney, but he has managed to confuse the 3d enthusiast community quite well when he decided to call offset bump mapping as virtual displacement. Most people only remember the "displacement" part and already I've seen people telling things like UE3 is pushing 300 million polygons...
Then there's offset bump mapping, which is an advanced form of bump mapping, and which Tim decided to call virtual displacement mapping. It basically needs a height map and a normal map, and a few extra isntructions, so it's not much beyond standard bump mapping.
Zeross said:It is not Sweeney's fault I've heard the same kind of non sense on Doom III. It comes from the PolyBump ?... oupps sorry it comes from the Detail Preserving Simplification technique.
Laa-Yosh said:I'm quite sure that there was no mention of displacement mapping about Doom3... The only thing ID said was 'renderbump' which isn't technical enough either, though.
Zeross said:As far as I'm concerned I think that THE biggest advantage of PS3.0 over PS2.0 is that this model is not "fragmented" with things like PS2.0, PS2.0a PS2.0b. At least with PS3.0 you now for sure that you have :
-512 instructions
-no restriction on these 512 instructions (ALU instructions or texture instructions posibly dependant)
-dsx dsy instructions
-centroid sampling
-MRT
All of these were available under different profiles for PS2.0 but it was so messy that developers chose to stick to straight 2.0. PS3.0 is giving a clear target for developers on what features to expect.
z = x > y ? a + b : c + d;
setp_gt p0, x, y
(p0) add z, a, b
(!p0) add z, c, d
sub t, x, y
add z1, a, b
add z2, c, d
cmp z, t, z1, z3
DemoCoder said:Bet what? I don't see him making a bettable claim.
One example is that any conditionals in shaders compiled with 3.0 can use predicates, which can save 1-2 instructions on average for a guaranteed performance win.
under SM3.0
Code:setp_gt p0, x, y (p0) add z, a, b (!p0) add z, c, d
under SM2.0
Code:sub t, x, y add z1, a, b add z2, c, d cmp z, t, z1, z3
aaronspink said:1 IPC Arch: 3+ cycles
2 IPC Arch: 2+ cycles
2+ IPC Arch: 2+ cycles (assuming predicate can't be used in same cycle which is generally correct for all microarchitectures I am aware of)
1 IPC Arch: 4 cycles
2 IPC Arch: 3 cycles
Shader1: 1. SETP 2. ADD 3. ADD
Shader2: 1. **** 2.*** 3. ***
Shader1: 1. SUB 2. ADD 3. CMP
Shader2: 1. ADD 2. *** 3. ***
Predication is not a pancea which has been proven by numerous studies. The concept is great, but the implementations leave a lot to be desired.
DemoCoder said:Yes, but it still saves work. On a 2 or more IPC architecture, you've saved a shader unit cycle which is freed up to schedule other non-dependant ops.I was actually talking about and IPC per pipe, but the general gist is there.
But on the GPU were are not comparing predication to real branches, in this thread, we are comparing it the CMP operator, which is just a CMOV instruction. I fail to see how the issues with write disablement vs conditional move are covered by these "numerous studies"
I'm not talking about in comparison to real branches...
I'm talking about real hardware implementation. If an architecture supports predication, then it opens itself up to additional hazard cases that don't exist without predication. In addition, the write kill functionality is not simple to implement, and in any case where you have a feedback path presents even more issues. There are some cases where full predication provides a win vs CMOV, but they are rare.
CMOV is a limited case of predication, as you state, but it is the limit that allows for a much simpler implementation. Things get very complicated in hardware when you start letting more than 1 thing potentially write the same location. It can be done, but you will end up with non-optimal scheduling and additional overheads.
If you are simple single issue, then the differences between CMOV and predication disappear, but when you are multi-issue (which I assume the new coming and future GPUs are), predication starts to make less sense.
Aaron Spink
speaking for myself inc
This means any PS2.0 HLSL shaders (say from HL2) with conditionals, when compiled with PS3.0 profile (and no other code changes by the developer, just a recompile) can get a 33%-10% performance boost.