Masaki Kawase's camera post-processing shader

FoxMcCloud

Newcomer
I'm trying to implement Masaki Kawase's camera post-processing shader used on the Xbox. The source code for it is provided in one slide but its operation isn't explained, and the source code is in xps_1_1, I'm assuming a register combiner variant for the NV2A. If I want to convert this to Cg or GLSL, where do I start? xps_1_1 docs don't seem to be publicly available. Here's the entire program:


xps.1.1

// c4.rgb : gray scale coefficients
def c4, 0.30f, 0.59f, 0.11f, 0.0f

// blend factor for three images
def c5, 0.0f, 0.0f, 0.0f, 0.5f
def c6, 0.0f, 0.0f, 0.0f, 0.333333333f

tex t0 // frame buffer
tex t1 // frame buffer
tex t2 // frame buffer
tex t3 // glare

// Soften frame buffer edges
lrp r0, c5.a, t1, t2
lrp r0, c6.a, t0, r0

// Add glare
mad r0, t3, c0, r0

// Calculate luminance
dp3 r1, r0, c4
// Emphasize contrast
mul_x2 v0, r0, r0
lrp r0, r1, r0, v0

// Modulate color
mul_x2 r0, r0, c3
// Fadeout
xfc c2.a, c2, r0, ZERO, ZERO, ZERO, r0.a
 
Those all look like regular ps_1_x instructions, with the exception of xfc, which I have no idea about. The rest is pretty straightforward. I think this should be pretty close:

Code:
texture2D FrameBufferTexture;
texture2D GlareTexture;

float GlareAmount : register(c0);

struct PSInput
{
    float2 tc0 : TEXCOORD0;
    float2 tc1 : TEXCOORD1;
    float2 tc2 : TEXCOORD2;
    float2 tc3 : TEXCOORD3;
};

float4 PS(in PSInput input) : COLOR0
{
    float4 fbSample0 = tex2D(FrameBufferTexture, input.tc0);
    float4 fbSample1 = tex2D(FrameBufferTexture, input.tc1);
    float4 fbSample2 = tex2D(FrameBufferTexture, input.tc2);
    float4 glareSample = tex2D(GlareTexture, input.tc3);
    
    // Soften frame buffer edges
    float4 color = lerp(fbSample1, fbSample2, 0.5f);
    color = lerp(fbSample0, color, 0.3333333f);
    
    // Add glare
    color += glareSample * GlareAmount;
    
    // Calculate luminance
    float luminance = dot(color.xyz, float3(0.30f, 0.59f, 0.11f));
    
    // Emphasize contrast
    float4 saturated = color * color * 2;
    color = lerp(color, saturated, luminance);
    
    // whatever xfc does goes here
    
    return color;
}
 
The xfc instruction maps to the glFinalCombinerInputNV command in the Nvidia OpenGL register combiners extension. Each of xfc's seven input parameters corresponds, respectively, to the a-variable to g-variable of the glFinalCombinerInputNV. (see this pdf as well) The primary purpose of the final combiner on the NV1x and NV2x chips was to handle fog and specular lighting, although it could be put to other uses too.

Therefore "xfc c2.a, c2, r0, ZERO, ZERO, ZERO, r0.a" should result (psuedocode) in something like:

Code:
output_color = c2.a * c2 + (1 - c2.a) * r0
output_alpha = r0.a

where r0 is equal to color in MJP's HLSL code example.

So put

Code:
float BlendFactor : register(c2);

up top in the constant declarations and

Code:
color.rgb = BlendFactor.a * BlendFactor.rgb + (1 - BlendFactor.a) * color.rgb;
color.a = color.a;

where the xfc code needs to go down at the bottom.
 
xfc c2.a, c2, r0, ZERO, ZERO, ZERO, r0.a

If you google what I quoted the second thing that comes up has the definition for the instruction. Use quick view and look for the title "Final Combiner".

• xfc s0,s1,s2,s3, s4,s5, s6

– Final output rgb = s0*s1 + (1-s0)*s2 + s3
– Final output alpha = s6

I copy pasted from it, hope this helps.
 
xfc sounds like (and looks like after the explanation) "cross fade color". Easy to remember. :)
 
Thanks a lot everyone - that was extremely helpful. I'll convert that to Cg and integrate it into my pipeline right now.
 
Thanks a lot everyone - that was extremely helpful. I'll convert that to Cg and integrate it into my pipeline right now.

ugh floating point values such efficiency from hardware implemented acceleration but at the cost of such convolution.

I'm so thankful java put things in their rightful place, their language implementation is in accord with the ideal hardware universal acceleration function. Less convolution, much simpler clearer code, enabling the comment to code rate to increase and clarity and simplicity to emerge brilliantly.
 
ugh floating point values such efficiency from hardware implemented acceleration but at the cost of such convolution.

I'm so thankful java put things in their rightful place, their language implementation is in accord with the ideal hardware universal acceleration function. Less convolution, much simpler clearer code, enabling the comment to code rate to increase and clarity and simplicity to emerge brilliantly.

This needs to stop soon, or else it will be stopped. Take a break, a few deep breaths, and reconsider exactly what B3D is and isn't, and what sort of discourse is expected. This also applies to the overly verbose fractal nonsense.
 
Back
Top