Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 09-Apr-2011, 18:17   #1
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default Nonsensical shader assembly

I disassembed a bunch of vs/ps and am going through it and found quite some nonsensical code-fragments like this:

rsq r1.w, r5.w // 1/sqrt(), r5.w is saturated [0,1]
rcp r1.w, r1.w // 1/1/sqrt(), sqrt()
add r2.w, -r1.w, c0.y
cmp r1.w, -r1.w, c0.y, r2.w // -r1.w >= 0 ? ... : ..., c0.y is 1

This is:

(-sqrt(r5.w) >= 0.0 ? (1 - sqrt(r5.w)) : 1)
(sqrt(r5.w) <= 0.0 ? (1 - sqrt(r5.w)) : 1)

Make no sense. sqrt can't yield negative results, besides the input being positive. Breaks down to:

(sqrt(r5.w) == 0.0 ? (1 - 0.0) : 1)

Whut? Some clouded way to set a register to 1? Did I oversee something?

---------

Got some more (checks for == 0 clearly):
((r0.w * r0.w) <= 0.0 ? 1.0 : 0.0)
(-r3.x >= r3.x ? 1.0 : 0.0)

Weirdos:
max(-r6.w, r6.w)
// which is abs(r6.w), the compiler didn't know _abs?
(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))
// which is 1 + abs(IN.texcoord_0.x), the compiler didn't know _abs? I can't believe breaking _abs into three scheduleable ops is possibly more efficient.

Last edited by Ethatron; 13-Apr-2011 at 17:47.
Ethatron is offline   Reply With Quote
Old 13-Apr-2011, 17:46   #2
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default

Uhm, no comment? Maybe a "Hey that's how it is, no need to search for an error, where there is none." ...
Ethatron is offline   Reply With Quote
Old 14-Apr-2011, 00:08   #3
DarthShader
Member
 
Join Date: Jul 2010
Location: Land of Mu
Posts: 350
Default

Quote:
Originally Posted by Ethatron View Post
(sqrt(r5.w) == 0.0 ? (1 - 0.0) : 1)
If r5.w was 0, you'd have division by 0 in the first line.
Quote:
(-r3.x >= r3.x ? 1.0 : 0.0)
That's actualy checking if r3.x is less or equal to zero. Isn't that what your first example tries to do, ending up with always false?
DarthShader is offline   Reply With Quote
Old 14-Apr-2011, 11:48   #4
itsmydamnation
Member
 
Join Date: Apr 2007
Location: Australia
Posts: 645
Default

im guessing these are oblivion shaders . if i had any capability to help you in any way shape or form i would, but i cant, so i wont.
itsmydamnation is offline   Reply With Quote
Old 14-Apr-2011, 21:12   #5
MDolenc
Member
 
Join Date: May 2002
Location: Slovenia
Posts: 420
Default

I did some reverse reverse engineering just for fun.
DX June 2010 SDK always figures out that max(-a, a) should translate to abs(a).

rsq r1.w, r5.w
rcp r1.w, r1.w
add r2.w, -r1.w, c0.y
cmp r1.w, -r1.w, c0.y, r2.w
What this does (with r5.w in [0, 1] and c0.y = 1) is that it flips your [0, 1] interval around to [1, 0] so you can't really lose that add. The cmp then makes sure r5.w == 1 that 1 is returned.
Basically for x from [0, 1) you end up with f(x) = (1-sqrt(x)) and for x == 1 you use f(x) = 1.

(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))
This one is a strange one though. You are right, this is just abs(IN.texcoord_0.x) + 1. Obviously developer told the compiler to do that. I find it really hard to belive that a compiler would figure out such an "optimisation" on it's own. But I don't know why it wouldn't take something like this out.
MDolenc is offline   Reply With Quote
Old 15-Apr-2011, 11:36   #6
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default

Thanks guys for the interest! I finally managed to get that I messed up the ternaries:

cmp r1.w, -r1.w, c0.y, r2.w

This is:

-r1.w >= 0 ? c0.y : r2.w

Somehow I misread the MS Shader Assembly Refrence (fuck you MS, your documentation is soooo bad, can't you just ffffuuu give readable equivalent expressions as AMD for x86 does?).

---------------

So

(IN.texcoord_0.x <= 0.0 ? (1 - IN.texcoord_0.x) : (IN.texcoord_0.x + 1))

turns

(IN.texcoord_0.x <= 0.0 ? (IN.texcoord_0.x + 1) : (1 - IN.texcoord_0.x))

which is

1- abs(IN.texcoord_0.x)

There are huuge amounts of colapsable abs-cases in the assembly, on one shader this reduction freed 5 arithmetic instructions (of 70).

---------------

And

(sqrt(r5.w) <= 0.0 ? (1 - sqrt(r5.w)) : 1)

turns

(sqrt(r5.w) <= 0.0 ? 1 : (1 - sqrt(r5.w)))

Makes total sense.

---------------

@itsmydamnation: Hehe, yap I got them all HLSLified now, 650 of them and they are really readable; I had to program a little asm->HLSL reconstructor which does op-reordering and contraction, as well as optimization.
Hope you like the awesome water as well. My next trick is going to add real-time shader editing into a side-window.

Anyone interested in the shaders can look here:

http://codaset.com/ethatron/oblivion...ree/PseudoHLSL

Last edited by Ethatron; 15-Apr-2011 at 11:44. Reason: fuuuuuu
Ethatron is offline   Reply With Quote
Old 15-Apr-2011, 14:09   #7
itsmydamnation
Member
 
Join Date: Apr 2007
Location: Australia
Posts: 645
Default

so does that mean for example Tomrek water shader could some how either directly replace the exisiting shader, or somehow hook into it if you have been able to decode them all.

im guessing if that could be done if would fix the water height issues etc?
itsmydamnation is offline   Reply With Quote
Old 16-Apr-2011, 22:05   #8
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default

The plan is to substitute the built-in shaders, they are perfectly embedded into their renderpass-groups.
Currently all those modifications (DoF, Bokeh, all of them) were deferred, post-process on the screen-rectangle; you can only go so far with that. The reason was simply there was no other way. Now as we've ripped open the engine with bare organs visible, one can expect a little less brutal modifications.
So with the shaders being decrypted water fe. will be as before (with waves) and not really (fresnel, foam, and what not).

Last edited by Ethatron; 17-Apr-2011 at 13:58. Reason: inglish
Ethatron is offline   Reply With Quote
Old 17-Apr-2011, 11:48   #9
itsmydamnation
Member
 
Join Date: Apr 2007
Location: Australia
Posts: 645
Default

looking forward to it, the question is do i need to go Xfire and a higher clocking+IPC CPU to get playalble oblivion frame rate, right now with REAVWD enabled im already at that point
itsmydamnation is offline   Reply With Quote
Old 17-Apr-2011, 14:14   #10
Ethatron
Member
 
Join Date: Jan 2010
Posts: 375
Default

Quote:
Originally Posted by itsmydamnation View Post
looking forward to it, the question is do i need to go Xfire and a higher clocking+IPC CPU to get playalble oblivion frame rate, right now with REAVWD enabled im already at that point
That's mainly not because of GPU-work, we believe the scene-graph update & traversal is the culprit; that one is intimetely connected with the "scripting" engine, would that be JITable you'd have 120 FPS again probably. My experiments while working the reflection-pass showed that one full reflection-pass (entire scene without billboards) utilized aprox. 2-3% of the budget when the reflection-rendertarget was 1.3 times bigger than the regular render-surface (width˛).

Of course there is profiling to be done for some real real numbers, but I don't expect the GPU saturation really soon, not like with the post-effects. Stacking them can easily lead to 10-15 additional "passes" on the screen-rectangle. But that's more because they are seperate for flexibility and low complexity not because it really has to be that way. Quite a few of them also do wide-tap filtering, maybe it's more of a bandwidth problem than an arithmetic problem.

The post-effects are not where I put my nose into. I'm more interested in delivering the posibility to work on the shaders of the core-pipeline.
Ethatron is offline   Reply With Quote
Old 17-Apr-2011, 23:41   #11
itsmydamnation
Member
 
Join Date: Apr 2007
Location: Australia
Posts: 645
Default

so which current post process effects do you think you will bring into the pipeline? godrays, water ? does this help shademe with his shadowing system
itsmydamnation is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:14.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.