Hellbinder[CE said:]Yes but that does not explain why they need to be hardcoded into the Driver.
If they're emulating fixed function with shaders, where should the shader code reside?
Hellbinder[CE said:]Yes but that does not explain why they need to be hardcoded into the Driver.
As pointed out it is very unlikely that that is what this is.If they're emulating fixed function with shaders, where should the shader code reside?
Russ, as I mentioned before, it's not reasonable to have every fixed function shader stored in the driver as there are far too many combinations. More I won't say.RussSchultz said:Hellbinder[CE said:]Yes but that does not explain why they need to be hardcoded into the Driver.
If they're emulating fixed function with shaders, where should the shader code reside?
OpenGL guy said:Russ, as I mentioned before, it's not reasonable to have every fixed function shader stored in the driver as there are far too many combinations. More I won't say.RussSchultz said:Hellbinder[CE said:]Yes but that does not explain why they need to be hardcoded into the Driver.
If they're emulating fixed function with shaders, where should the shader code reside?
OpenGL guy said:Russ, as I mentioned before, it's not reasonable to have every fixed function shader stored in the driver as there are far too many combinations. More I won't say.RussSchultz said:Hellbinder[CE said:]Yes but that does not explain why they need to be hardcoded into the Driver.
If they're emulating fixed function with shaders, where should the shader code reside?
MUL o[TEX0],v[9],c[15].xyzz;
MUL o[TEX1],v[10],c[23].xyzz;
MUL o[TEX2],v[11],c[31].xyzz;
MUL o[TEX3],v[12],c[39].xyzz;
MUL R0,v[9],c[15].xyzz;
MOV o[TEX0],R0;
MUL R1,v[10],c[23].xyzz;
MOV o[TEX1],R1;
MUL R2,v[11],c[31].xyzz;
MOV o[TEX2],R2;
MUL R3,v[12],c[39].xyzz;
MOV o[TEX3],R3;
SGE R4.x,c[63].x,c[63].x;
SLT R4.y,c[63].x,c[63].x;
SLT R5,c[63],R4.yyyy;
SGE R6,c[63],R4.xxxx;
ADD R7,R4.xxxx,-R5.xyzw;
ADD R7,R7,-R6.xyzw;
MUL R8.w,R0.y,R5.x;
MAD R8.w,R0.z,R6.x,R8.w;
MAD o[TEX0].w,R4.x,R7.x,R8.w;
MUL R8.w,R1.y,R5.y;
MAD R8.w,R1.z,R6.y,R8.w;
MAD o[TEX1].w,R4.x,R7.y,R8.w;
MUL R8.w,R2.y,R5.z;
MAD R8.w,R2.z,R6.z,R8.w;
MAD o[TEX2].w,R4.x,R7.z,R8.w;
MUL R8.w,R3.y,R5.w;
MAD R8.w,R3.z,R6.w,R8.w;
MAD o[TEX3].w,R4.x,R7.w,R8.w;
And how useful is this to the hardware/driver? Would it even fit in the hardware?DemoCoder said:First of all, these are vertex shaders, and it certainly is reasonable to do it. You can implement the entire OpenGL T&L lighting model in one shader with static branches. The OpenGL2 shading spec lists such a megashader.
And just what are these "common cases"? How do you handle ones that aren't in your list? And isn't it better to treat them all the same way?For various reasons, such a megashader is not optimal, and for that reason, you would probably want to create short versions for the "common case" T&L vertex shaders, then handle the ones that don't fit with a "catch all"
Uttar said:Frankly, I don't know whether some FX12 units are shared between Fragment and Geometry in the NV3x. I'm just saying it's *possible*
OpenGL guy said:And how useful is this to the hardware/driver? Would it even fit in the hardware?DemoCoder said:First of all, these are vertex shaders, and it certainly is reasonable to do it. You can implement the entire OpenGL T&L lighting model in one shader with static branches. The OpenGL2 shading spec lists such a megashader.
You want me to divulge our driver secrets? Give me a break.DemoCoder said:Well, how would YOU do it? Let's postulate that your hardware lacks a fixed function pipeline. Tell me how you would provide this functionality without crafting vertex shaders.
The API (D3D or OpenGL) does no such thing. How would the API even know it had to do this? If you export HW_VERTEX_PROCESSING in the D3D caps, then that means you support HW vertex processing. In other words, the driver/hardware has to handle everything if requested to by the application.The only other possibility is some sort of "dynamic" vertex shader creation where the API looks at all the pipeline state and creates a vertex shader "on the fly" to implement the pipeline state, but this is inefficient and will most likely not generate optimal shaders unless you plan to implement a peephole optimizer as well.
I don't see any need to waste clock cycles at all.For example, you could have code like
if lighting
foreach opengl light enabled
if diffuse
generate vertex shader fragment to do diffuse
if specular
generate vertex fragment to do do specular
...
if fog
gen fog
...
But you will likely waste some clock cycles in the implementation by not being clever about reuse of instructions.
OpenGL guy said:You want me to divulge our driver secrets? Give me a break.DemoCoder said:Well, how would YOU do it? Let's postulate that your hardware lacks a fixed function pipeline. Tell me how you would provide this functionality without crafting vertex shaders.
The API (D3D or OpenGL) does no such thing. How would the API even know it had to do this? If you export HW_VERTEX_PROCESSING in the D3D caps, then that means you support HW vertex processing. In other words, the driver/hardware has to handle everything if requested to by the application.
I don't see any need to waste clock cycles at all.
Puh-lease read my NDA. Puh-lease look around for what information ATI has divulged about the architecture of the R300 (and derivatives). Puh-lease take your sarcasm elsewhere.DemoCoder said:OpenGL guy said:You want me to divulge our driver secrets? Give me a break.DemoCoder said:Well, how would YOU do it? Let's postulate that your hardware lacks a fixed function pipeline. Tell me how you would provide this functionality without crafting vertex shaders.
Puh-lease, how about giving me a break. Anyone with two brain cells can enumerate the possible methods of achieving fixed function emulation, since it is no great trade secret. I doubt SIGGRAPH is going to be accepting any papers on your implementation. You could say "we do it dynamically" vs statically. Oooh, that would be a huge leak of proprietary information that would sure to give NVidia alot of help.
Without access to the driver source code or feedback from nvidia we are all speculating. What makes my speculation any less valid?You could have simply said that you don't have any thing to back up your comments in this thread.
Obviously, they don't look anything like "stubs" for fixed function vertex shader code to me.You accused Russ in the Quack thread of "not doing the legwork", well, here you are making accusations about the purpose or no purpose of these NVidia vertex shader fragments. Why not do the legwork for us and figure out what they are meant for.
So which one is it? I doubt you'd append them to shaders because that would just make them longer and slower and would also change the end result in many cases. Shader replacements sounds more reasonable, especially given the presence of the "VP1.0" and "END" tokens.I proposed that they are used somehow for the fixed function pipeline. Others proposed they are appended or prepended to existing shaders for some reason. I also proposed that perhaps they are substitutions used in Viewperf benches.
You asked me about how I would implement something. You did not ask me what I thought the code was for. Again, take your barbs elsewhere.But regardless, you have the source code now, so I would like to see the opinion of a guy who supposedly works on ATI's drivers about what these short instruction shaders (which appear to do almost nothing) are used for. Do some legwork. Retreating behind "I can't talk because I don't want to divulge secrets" removes you from the discussion.
Then say driver and not API because they are not the same thing.I am talking about the driverThe API (D3D or OpenGL) does no such thing. How would the API even know it had to do this? If you export HW_VERTEX_PROCESSING in the D3D caps, then that means you support HW vertex processing. In other words, the driver/hardware has to handle everything if requested to by the application.
You're right, I have no imagination.intercepting calls to draw with fixed function state and composing vertex programs on the fly and uploading them to the GPU to perform the needed fixed function processing. If you can't imagine how this is done, I won't go any further. It's a trade secret.
Then don't make a naive code generator for cryin' out loud! nvidia has plenty of resources to make smart code, right? Good grief.Well then you are not thinking hard enough. The best example is the NV30 architecture. Each additional register lowers performance. A naive code generator would not generate optimal code. Moreover, as the vertex shader HW gets more complex and general purpose, you have the additional overhead of parallel execution scheduling, resource hazards, and superiority of handtweaked algorithms.I don't see any need to waste clock cycles at all.
There are several steps to code compilation. For example, there's conversion to machine code and then optimization. Why can't the driver optimize the code? And if you are building shaders from scratch, as you mentioned above, then you have two opportunities for optimization.If you suppose that a vertex code generator always generates optimal code, then #1 you violate the "full employment theorem for compiler writers" and #2 your HW is probably very simple with respect to parallelism, scheduling, and resource usage.
OpenGL guy said:And if your HW was not so sensitive to resource usage, then I would say it's more complex, not less.