D44.03 shader code

Discussion in 'Architecture and Products' started by BoardBonobo, May 28, 2003.

  1. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    If they're emulating fixed function with shaders, where should the shader code reside?
     
  2. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Yeah, they're likely innocent, as Popnfresh said.

    But simple question then: WHY are they doing this?
    One possible explanation is that those are the functions their dedicated T&L lighting power ( which, I speculate, are extra FX12 units for non-transform related work ) can emulate.

    So unlike the obvious, transforming T&L into VS, nVidia would be doing the opposite here: transforming part of a VS program into T&L to gain speed. Of course, that assumes they can use both at the same time, which they likely can although not through direct API support ( or maybe they only can in the NV35 or something, no idea )

    Of course, that'd mean those things would only be templates the drivers would base themselves on ( and thus only look for it roughly, not caring about register number or stuff ) to find what it can optimizes.

    All just speculation, of course, and I'd be very surprised if what I said hold true, it's more likely BS. But hey, what could it be then?


    Uttar
     
  3. Hellbinder

    Banned

    Joined:
    Feb 8, 2002
    Messages:
    1,444
    Likes Received:
    12
    As pointed out it is very unlikely that that is what this is.
     
  4. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Russ, as I mentioned before, it's not reasonable to have every fixed function shader stored in the driver as there are far too many combinations. More I won't say.
     
  5. Popnfresh

    Newcomer

    Joined:
    Mar 8, 2003
    Messages:
    19
    Likes Received:
    0
    Speculation: They may be code fragments for appending to user vertex shader code.

    For instance:
    According to spec the viewport transformation (normalized view space to screenspace) takes place after the vertex shader runs. In practice, this transformation is also done in the vertex shader unit if you're using the programmable pipeline and the driver appends a few extra vertex shader instructions to the end of your program to do this.

    The way the spec / nvidia's drivers handle point sprites might require something simular. I don't know the exact specification in D3D for doing point sprites so i'm just speculating. Still seems perfectly innocent to me. Except for the first one there's absolutely nothing fancy going on in those vertex shaders.
     
  6. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Uttar -> I read in many of your posts that you think nvidia could share FX12 units of the fragment part with the geometry part.

    I think it couldn't be the case. Flexibility is NVIDIA marketing answer to questions about bad performances questions. "it's flexible, we could optimize". Flexibility is the opposite of pipeline.

    Use a fragment pipeline to work on geometry : OK
    It'll be the case in the future I think but it's not in the NV3x.

    Take one unit of a fragment pipeline and put it on a geometry pipepline : NO
    I don't think that doing this is possible. It would kill pipeline efficiency.

    The only things they could 'easily' share between geometry pipelines and fragment pipelines are registers or buffers.

    What do you think about it ?
     
  7. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Frankly, I don't know whether some FX12 units are shared between Fragment and Geometry in the NV3x. I'm just saying it's *possible*, and it might be an efficient way to look at it, considering that otherwise those FX12 Geometry units are often not used.

    What's fairly likely, however, is that nVidia got FX12 units dedicated for hardwired T&L functionality.

    What I do think is that the NV40 might have fully shared VS/PS functionality. I'm sayign might, because frankly, few facts back it up. But it would just be insane to use an unified instruction set and stuff ( as confirmed by CMKRNL several months ago ) and use some dynamic allocation...


    Uttar
     
  8. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    Well, ok, so not every fixed function shader is there, but that really isn't the point.

    The helper fragments, or whatever have to reside somewhere where the driver can reach them. That would be: in the driver or in the registery. Ring0 can't touch the filesystem directly (if my vague recollection of WinNT/XP architecture is on target)
     
  9. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    First of all, these are vertex shaders, and it certainly is reasonable to do it. You can implement the entire OpenGL T&L lighting model in one shader with static branches. The OpenGL2 shading spec lists such a megashader.


    For various reasons, such a megashader is not optimal, and for that reason, you would probably want to create short versions for the "common case" T&L vertex shaders, then handle the ones that don't fit with a "catch all"

    Unless ATI has fixed function hw for T&L, you guys are emulating also. That means your driver or video card bios contains low level native vertex shader snippets, not neccessarily in text form.

    Most of those Nvidia shaders do nothing that is application specific. They either do nothing but pass through to pixel shader, or simply transformation of texture coordinates, etc. The only one that looks out of place is

    There are 4 branches in this code and a bunch of scalar math.
     
  10. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    And how useful is this to the hardware/driver? Would it even fit in the hardware?
    And just what are these "common cases"? How do you handle ones that aren't in your list? And isn't it better to treat them all the same way?
     
  11. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Then I'm going to kick in and say it's not possible. FX12 for transform is just too little precision for transform and simply doesn't have anywhere near the range we need for geometry. Fp16 would probably have to little precision too, definitely too small range. Fp24? Could maybe work, but would think that all vendors are already doing things at 32bit in the vertex pipeline.
     
  12. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Well, how would YOU do it? Let's postulate that your hardware lacks a fixed function pipeline. Tell me how you would provide this functionality without crafting vertex shaders.

    The only other possibility is some sort of "dynamic" vertex shader creation where the API looks at all the pipeline state and creates a vertex shader "on the fly" to implement the pipeline state, but this is inefficient and will most likely not generate optimal shaders unless you plan to implement a peephole optimizer as well.

    For example, you could have code like

    if lighting
    foreach opengl light enabled
    if diffuse
    generate vertex shader fragment to do diffuse
    if specular
    generate vertex fragment to do do specular
    ...

    if fog
    gen fog

    ...

    But you will likely waste some clock cycles in the implementation by not being clever about reuse of instructions.
     
  13. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    BTW, there is a simple test to find out if these vertex shaders are benchmark specific.

    Hex edit the drivers and turn them into NO-OPs. Then, run some fixed function T&L apps/games and see if you get artifacts/screwed up visuals. Then, run a T&L heavy benchmark like Viewperf and see if it is screwed or runs faster with nop'ed shaders.

    If hex editing screws up any and all games that use fixed function, then they are not cheats, but fixed function emulation optimizations.

    If they only appear to do anything on Viewperf or say, 3dMark VS benchmarks, then you know they are benchmark specific. However, if it's good for ATI to replace shaders with optimized functionally identical equivalents, why would it be bad for Nvidia?
     
  14. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    You want me to divulge our driver secrets? Give me a break.
    The API (D3D or OpenGL) does no such thing. How would the API even know it had to do this? If you export HW_VERTEX_PROCESSING in the D3D caps, then that means you support HW vertex processing. In other words, the driver/hardware has to handle everything if requested to by the application.
    I don't see any need to waste clock cycles at all.
     
  15. AndrewM

    Newcomer

    Joined:
    May 28, 2003
    Messages:
    219
    Likes Received:
    2
    Location:
    Brisbane, QLD, Australia
    BTW, All of those vertex programs are OpenGL Vertex Programs. Specifically NV_vertex_program. The first one is nv_v_p2 tho.

    They have nothing to do with 3DMark03 (as that is a D3D program).
     
  16. Popnfresh

    Newcomer

    Joined:
    Mar 8, 2003
    Messages:
    19
    Likes Received:
    0
    BTW: the first vertex shader in the text file seems to be missing some code.

    It doesn't have the %!VP1.0 MAIN: header and more importantly doesn't write to o[HPOS] (the output vertex position). It's not a valid shader.

    Also: since only the (incomplete) first shader and the (simple pass-through) last shader are the only ones that DON'T do something with point sprites I don't know why anyone is even considering that these are "replacement" shaders for benchmarks or such. NVidia's vertex shaders get compiled down to a microcode -- a single instruction becomes one or more microcode instructions - and if they were going to replace shaders its much more likely they'd replace with optimised microcode versions.
     
  17. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Puh-lease, how about giving me a break. Anyone with two brain cells can enumerate the possible methods of achieving fixed function emulation, since it is no great trade secret. I doubt SIGGRAPH is going to be accepting any papers on your implementation. You could say "we do it dynamically" vs statically. Oooh, that would be a huge leak of proprietary information that would sure to give NVidia alot of help.

    You could have simply said that you don't have any thing to back up your comments in this thread. You accused Russ in the Quack thread of "not doing the legwork", well, here you are making accusations about the purpose or no purpose of these NVidia vertex shader fragments. Why not do the legwork for us and figure out what they are meant for.

    I proposed that they are used somehow for the fixed function pipeline. Others proposed they are appended or prepended to existing shaders for some reason. I also proposed that perhaps they are substitutions used in Viewperf benches.

    But regardless, you have the source code now, so I would like to see the opinion of a guy who supposedly works on ATI's drivers about what these short instruction shaders (which appear to do almost nothing) are used for. Do some legwork. Retreating behind "I can't talk because I don't want to divulge secrets" removes you from the discussion.



    I am talking about the driver intercepting calls to draw with fixed function state and composing vertex programs on the fly and uploading them to the GPU to perform the needed fixed function processing. If you can't imagine how this is done, I won't go any further. It's a trade secret.


    Well then you are not thinking hard enough. The best example is the NV30 architecture. Each additional register lowers performance. A naive code generator would not generate optimal code. Moreover, as the vertex shader HW gets more complex and general purpose, you have the additional overhead of parallel execution scheduling, resource hazards, and superiority of handtweaked algorithms.

    If you suppose that a vertex code generator always generates optimal code, then #1 you violate the "full employment theorem for compiler writers" and #2 your HW is probably very simple with respect to parallelism, scheduling, and resource usage.
     
  18. Hellbinder

    Banned

    Joined:
    Feb 8, 2002
    Messages:
    1,444
    Likes Received:
    12
    Yeah, I was told by someone earlier today that they were all OpenGL.

    I guess we might want to start checking all those OpenGL games used as benchmarks now eh? :wink:
     
  19. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Puh-lease read my NDA. Puh-lease look around for what information ATI has divulged about the architecture of the R300 (and derivatives). Puh-lease take your sarcasm elsewhere.
    Without access to the driver source code or feedback from nvidia we are all speculating. What makes my speculation any less valid?
    Obviously, they don't look anything like "stubs" for fixed function vertex shader code to me.
    So which one is it? I doubt you'd append them to shaders because that would just make them longer and slower and would also change the end result in many cases. Shader replacements sounds more reasonable, especially given the presence of the "VP1.0" and "END" tokens.
    You asked me about how I would implement something. You did not ask me what I thought the code was for. Again, take your barbs elsewhere.
    Then say driver and not API because they are not the same thing.
    You're right, I have no imagination.
    Then don't make a naive code generator for cryin' out loud! nvidia has plenty of resources to make smart code, right? Good grief.

    Also, this problem should have occurred to people long before the product came out the door.
    There are several steps to code compilation. For example, there's conversion to machine code and then optimization. Why can't the driver optimize the code? And if you are building shaders from scratch, as you mentioned above, then you have two opportunities for optimization.

    You're right, it may not always give the best results, but it sure better give the majority of the performance of hand-tuned code.

    And if your HW was not so sensitive to resource usage, then I would say it's more complex, not less.
     
  20. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    Well, the itanium folks would probably get all in a hissy about that.

    Beyond thatl, what are your opinions for what these shader snippets are?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...