Inquirer spreading R420 info

zeckensack said:
Gradient: prepare an adequately sized luminance only mipmap, where each mipmap level encodes a step between 0 and 1. Use a trilinear filter. Do a dependent fetch from that mipmap with whatever quantity you need a gradient for. You can do two fetches from a 1D luminance mipmap if you need separate x/y gradients.
How would that work? You would simply obtain a value that is dependent upon the value of the current pixel, and totally independent of the values of neighboring pixels.
 
Chalnoth, I don't think you understand his idea. You fill the top-level mip-map with black, the next with grey, and so on until you make the bottom mipmap (1x1) white. The actual values of the lookup coordinates won't make any difference to the value fetched, but ATI and NV use gradients to determine which mip-map to use. This will get you a value that'll probably be useful for simple shader antialiasing algorithms.

However, this will only get one scalar value. The fetched value will probably be a funtion of the max of the x and y gradients for trilinear filtering, and a function of the min for anisotropic filtering. I don't see how you could specifically retrieve the x and y gradients this way, zeckensack.
 
Hmm just saw something.

ps_2_0 Features
New features:

Three new swizzles - .yzxw, .zxyw, .wzyx
Number of Temporary Registers (r#) increased to 12
Number of Constant Float Registers (c#) increased to 32
Number of Texture Coordinate Registers (t#) increased to 8


ps_3_0 Features
New features:

Consolidated 10 Input Registers (v#)
Indexable Constant Float Register (c#) with Loop Counter Register (aL)
Number of Temporary Registers (r#) increased to 32
Number of Constant Float Registers (c#) increased to 224

Source:

Isn't the the registers that sireric was talking about??

Does this mean the 9700/9800 supports ps3.0??

US
 
I don't see where it would work where you needed it most: a large differential between neighboring values. Also, using gradients as parameters to texture lookup are the least of your worries.

Consider this shader by Larry Gritz

Code:
#define boxstep(a,b,x) (clamp(((x)-(a))/((b)-(a)),0,1))
#define MINFILTERWIDTH 1.0e-7

surface
screen_aa (float Ka = 1, Kd = 0.75, Ks = 0.4, roughness = 0.1;
   color specularcolor = 1;
   float density = 0.25, frequency = 20;)
{
  point Nf;     /* Forward facing Normal vector */
  point IN;     /* normalized incident vector */
  float d;      /* Density at the sample point */
  float ss, tt; /* s,t, parameters in phase */
  float swidth, twidth, GWF, w, h;

  /* Compute a forward facing normal */
  IN = normalize (I);
  Nf = faceforward (normalize(N), I);

  /* Determine how wide in s-t space one pixel projects to */
  swidth = max (abs(Du(s)*du) + abs(Dv(s)*dv), MINFILTERWIDTH) *
frequency;
  twidth = max (abs(Du(t)*du) + abs(Dv(t)*dv), MINFILTERWIDTH) *
frequency;

  /* Figure out where in the pattern we are */
  ss = mod (frequency * s, 1);
  tt = mod (frequency * t, 1);

  /* Figure out where the strips are. Do some simple antialiasing. */
  GWF = density*0.5;
  if (swidth >= 1)
      w = 1 - 2*GWF;
  else w = clamp (boxstep(GWF-swidth,GWF,ss), max(1-GWF/swidth,0), 1)
 - clamp (boxstep(1-GWF-swidth,1-GWF,ss), 0, 2*GWF/swidth);
  if (twidth >= 1)
      h = 1 - 2*GWF;
  else h = clamp (boxstep(GWF-twidth,GWF,tt), max(1-GWF/twidth,0),1)
 - clamp (boxstep(1-GWF-twidth,1-GWF,tt), 0, 2*GWF/twidth);
  /* This would be the non-antialiased version:
   *    w = step (GWF,ss) - step(1-GWF,ss);
   *    h = step (GWF,tt) - step(1-GWF,tt);
   */
  d = 1 - w*h;

  Oi = d;
  if (d > 0) {
      Ci = Oi * ( Cs * (Ka*ambient() + Kd*diffuse(Nf)) +
 specularcolor * Ks*specular(Nf,-IN,roughness));
    }
  else
      Ci = 0;
}
 
Oops :oops: .. phrased that wrong .. was meant to say does the 9700/9800 already support some of the features for SM3.0?

Nevermind this though. :)

US
 
Chalnoth said:
zeckensack said:
I know dynamic branching has been regarded by some as an optimization technique. I don't take that for granted right now. hardware.fr's results weren't encouraging IMO.
As long as that was a dynamic branch, I really don't see the problem. ~8 cycle performance hit on a dynamic branch in the pixel shader just doesn't seem that bad to me. This will obviously limit the cases you'd want to use a dynamic branch, but in no way makes one useless.
I've seen certain people drool over the possibility to skip over single texture lookups with a data dependent branch. Eg it was proposed as an in-driver optimization for titles such as UT2k4 (yup). Demirug's usually a clever chap, it looks like he was wrong this time. Linkage (German).
(he's the one who wrote the most excellent NV30 analysis btw)
So, Could have been, but turned out not to be. You need to be extra careful about what you do, what to expect, if you start writing "optimal" shaders for SM3 archs.
Chalnoth said:
512 instructions is a lot of code for a single shader, though. Executing them all would reduce NV40 "ultra"'s fillrate right back in S3 Virge territory.
I don't think it would be that bad.
Well, I do ;)
At 400 MHz, you'll get 2*16*400Mops/s=12800Mops/s (I'm counting 4d vector results) peak throughput.
12800Mops/s / 512 ~= 25Mproxels/s, if things go well. What was the Virge's fillrate again? ;)
Chalnoth said:
Anyway, I'm more interested in long shaders for non-graphics stuff.
Yes, this will actually be feasible with NV40. It wasn't/isn't so feasible with NV30/NV35 IMO, because there just isn't enough of a performance differential to x86 CPUs, once you look at very long fragment shaders (influence of vertex/setup/raster performance diminishes).
 
DemoCoder said:
I don't see where it would work where you needed it most: a large differential between neighboring values. Also, using gradients as parameters to texture lookup are the least of your worries.

Consider this shader by Larry Gritz

That's a lot of damn shader work for "simple" antialiasing! Anyway, zeckensack's method would give you a function of the max of swidth and twidth (I think). I think that would give you a similar result with some fiddling.

That shader would be completely rewritten for any real-time application. I'm planning on trying it out myself (I've always wanted to write a "better alpha-test" shader), and I'll let you know what it looks like.
 
Back
Top