Nonsensical shader assembly

Ethatron

Regular
Supporter
I disassembed a bunch of vs/ps and am going through it and found quite some nonsensical code-fragments like this:

rsq r1.w, r5.w // 1/sqrt(), r5.w is saturated [0,1]
rcp r1.w, r1.w // 1/1/sqrt(), sqrt()
add r2.w, -r1.w, c0.y
cmp r1.w, -r1.w, c0.y, r2.w // -r1.w >= 0 ? ... : ..., c0.y is 1

This is:

(-sqrt(r5.w) >= 0.0 ? (1 - sqrt(r5.w)) : 1)
(sqrt(r5.w) <= 0.0 ? (1 - sqrt(r5.w)) : 1)

Make no sense. sqrt can't yield negative results, besides the input being positive. Breaks down to:

(sqrt(r5.w) == 0.0 ? (1 - 0.0) : 1)

Whut? Some clouded way to set a register to 1? Did I oversee something?

---------

Got some more (checks for == 0 clearly):
((r0.w * r0.w) <= 0.0 ? 1.0 : 0.0)
(-r3.x >= r3.x ? 1.0 : 0.0)

Weirdos:
max(-r6.w, r6.w)
// which is abs(r6.w), the compiler didn't know _abs?
(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))
// which is 1 + abs(IN.texcoord_0.x), the compiler didn't know _abs? I can't believe breaking _abs into three scheduleable ops is possibly more efficient.
 
Last edited by a moderator:
Uhm, no comment? Maybe a "Hey that's how it is, no need to search for an error, where there is none." ... :?:
 
I did some reverse reverse engineering just for fun. :)
DX June 2010 SDK always figures out that max(-a, a) should translate to abs(a).

rsq r1.w, r5.w
rcp r1.w, r1.w
add r2.w, -r1.w, c0.y
cmp r1.w, -r1.w, c0.y, r2.w
What this does (with r5.w in [0, 1] and c0.y = 1) is that it flips your [0, 1] interval around to [1, 0] so you can't really lose that add. The cmp then makes sure r5.w == 1 that 1 is returned.
Basically for x from [0, 1) you end up with f(x) = (1-sqrt(x)) and for x == 1 you use f(x) = 1.

(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))
This one is a strange one though. You are right, this is just abs(IN.texcoord_0.x) + 1. Obviously developer told the compiler to do that. I find it really hard to belive that a compiler would figure out such an "optimisation" on it's own. But I don't know why it wouldn't take something like this out.
 
Thanks guys for the interest! I finally managed to get that I messed up the ternaries:

cmp r1.w, -r1.w, c0.y, r2.w

This is:

-r1.w >= 0 ? c0.y : r2.w

Somehow I misread the MS Shader Assembly Refrence (fuck you MS, your documentation is soooo bad, can't you just ffffuuu give readable equivalent expressions as AMD for x86 does?).

---------------

So

(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))

turns

(IN.texcoord_0.x <= 0.0 ? (IN.texcoord_0.x + 1) : (1 - IN.texcoord_0.x))

which is

1- abs(IN.texcoord_0.x)

There are huuge amounts of colapsable abs-cases in the assembly, on one shader this reduction freed 5 arithmetic instructions (of 70).

---------------

And

(sqrt(r5.w) <= 0.0 ? (1 - sqrt(r5.w)) : 1)

turns

(sqrt(r5.w) <= 0.0 ? 1 : (1 - sqrt(r5.w)))

Makes total sense.

---------------

@itsmydamnation: Hehe, yap I got them all HLSLified now, 650 of them and they are really readable; I had to program a little asm->HLSL reconstructor which does op-reordering and contraction, as well as optimization.
Hope you like the awesome water as well. ;) My next trick is going to add real-time shader editing into a side-window.

Anyone interested in the shaders can look here:

http://codaset.com/ethatron/oblivion-shader-db/source/master/tree/PseudoHLSL
 
Last edited by a moderator:
so does that mean for example Tomrek water shader could some how either directly replace the exisiting shader, or somehow hook into it if you have been able to decode them all.

im guessing if that could be done if would fix the water height issues etc?
 
The plan is to substitute the built-in shaders, they are perfectly embedded into their renderpass-groups.
Currently all those modifications (DoF, Bokeh, all of them) were deferred, post-process on the screen-rectangle; you can only go so far with that. The reason was simply there was no other way. Now as we've ripped open the engine with bare organs visible, one can expect a little less brutal modifications.
So with the shaders being decrypted water fe. will be as before (with waves) and not really (fresnel, foam, and what not). :)
 
Last edited by a moderator:
looking forward to it, the question is do i need to go Xfire and a higher clocking+IPC CPU to get playalble oblivion frame rate, right now with REAVWD enabled im already at that point :oops:
 
looking forward to it, the question is do i need to go Xfire and a higher clocking+IPC CPU to get playalble oblivion frame rate, right now with REAVWD enabled im already at that point :oops:

That's mainly not because of GPU-work, we believe the scene-graph update & traversal is the culprit; that one is intimetely connected with the "scripting" engine, would that be JITable you'd have 120 FPS again probably. My experiments while working the reflection-pass showed that one full reflection-pass (entire scene without billboards) utilized aprox. 2-3% of the budget when the reflection-rendertarget was 1.3 times bigger than the regular render-surface (width²).

Of course there is profiling to be done for some real real numbers, but I don't expect the GPU saturation really soon, not like with the post-effects. Stacking them can easily lead to 10-15 additional "passes" on the screen-rectangle. But that's more because they are seperate for flexibility and low complexity not because it really has to be that way. Quite a few of them also do wide-tap filtering, maybe it's more of a bandwidth problem than an arithmetic problem.

The post-effects are not where I put my nose into. I'm more interested in delivering the posibility to work on the shaders of the core-pipeline.
 
so which current post process effects do you think you will bring into the pipeline? godrays, water ? does this help shademe with his shadowing system :LOL::D
 
Back
Top