Shading Languages for the New APIs

Discussion in '3D Hardware, Software & Output Devices' started by Tim Foley, Sep 2, 2014.

  1. darksylinc

    Newcomer

    Joined:
    Sep 6, 2014
    Messages:
    1
    Likes Received:
    0
    I came here because I just hit with an issue with GLSL I believe has no solution with the current standard; and remembered this thread:

    GLSL doesn't support reference nor pointers.

    I found myself doing something like this (note that I'm not finished, so this code may not even compile or be valid):
    Now, I may end up writting "material.m[materialId]" too many times. And I have no intention of referencing any other index in material.m. This clutters the readability of my code.
    If this were C, I would simply do:
    Now, I'm reluctant to write Material mat = material.m[materialId]; because I fear some compilers may end up doing a full copy in the registers. And considering GCN has trouble with register pressure, well, it could get nasty.

    By adding a reference:
    Material &mat = material.m[materialId];
    compilers can decide best what to do depending on the architecture:

    * Some archs will recalculate the address every time.
    * Some archs will behave like in C: hold a pointer to the address (&material.m + materialId) (current HW can do this).
    * Some archs will do a full copy.

    But right now, we don't have anything nowhere near close to that; so I'm in a point where every solution I consider seems to be suboptimal; and I'm even considering writing the verbose "material.m[materialId]" every time or using a macro as in like "#define mat material.m[materialId]".

    My 2 cents.

    Cheers
    Matias Goldberg
     
  2. Groovounet

    Newcomer

    Joined:
    Dec 12, 2013
    Messages:
    9
    Likes Received:
    0
    There are actually many cases where references would be useful. I'd like references on opaque types for example.

    void my_sampling(in sampler2D sampler); // This is fine.

    But

    void main()
    {
    sampler2D Sampler = DiffuseSampler; // That's not fine
    }

    It is logically the same and the same for the hardware.

    Same for functions such as interpolateAtOffset. They accept only shader input variables. Consequently, they can't be used inside a function. All we really need is a reference to pass directly the shader input variable to the core of a function without copying the value.

    Shader storage buffer had the same sort of reference issue.


    Speaking of feature requests: reinterpret_cast, static_cast, typedef, sizeof, unions; all these basic stuff, GLSL doesn't even have it! Oo
     
  3. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I am Sebastian Aaltonen from Ubisoft / RedLynx. Our latest game was released for four platforms: Xbox 360, PS4, XBox One and PC. For our most recent released project we wrote all our shaders with HLSL + custom extensions. We use macros and function calls (with #ifdefs inside them) to separate platform specific differences, such as texture sampling, buffer loading, system value semantics, memexport / streamout / UAV write, constants (CBs vs DX9 style free constants on registers, CB packing differences), etc. We have a few common header files that all our shaders include and these headers include all the macros and the functions we need to make the same shader code compile on all the platforms.

    This approach is doable on the platforms that we currently support, because the basic syntax of all the languages we need to support is quite similar. The most common use of macros is texture sampling (and passing sampler + texture pairs to functions), system value semantics and constant loading. Otherwise our shaders looks pretty much like standard HLSL. This approach has been working quite well for us so far. However going forward I see two big problems with this approach. The lesser problem being compute shaders (all the synchronization primitives, append buffers, LDS, etc need more preprocessor hackery) and the greater problem being porting to mobile platforms (OpenGL ES 3.1 and Metal). This shader languages have syntax further away from HLSL, and would require extensive amount of preprocessor hackery to be able to run the same code than the other APIs that have more HLSL like shader syntax.

    The second major issue with current graphics APIs is the lack of efficient resource binding approach. Using register indices to bind stuff to GPU is hard to maintain, and using strings at runtime is even worse (performance problem).

    The third issue is the lack of generic programming functionality. For example I want to support uint32, uint16, float32 and float16 keys in my GPU radix sorter (float keys also flip the highest bit) and I want to support user definable value types (type T). I can't do this without preprocessor hackery, as HLSL doesn't support templates. Bigger problems arise with user defined containers, since a single GPU program might want to use both LinkedList<A> and LinkedList<B>, where A and B are different types. This gets really ugly with preprocessor hacks.

    My preferred approach to solve both of these problems is similar to C++AMP. You would write all your shader stages as C++11 lambdas (or function objects) and capture in to the lambda all the buffers and resources (textures, samples, vertex buffers, etc) that you need inside your parallel processing function. If you need system value semantics (such as the vertex index), you add these as parameters to the lambda function. My preferred way to load vertex data from a vertex buffer would be to index the captured vertex buffer with the system value vertex index (auto myVertex = vbuf[index]).

    Fully integrated C++11 lambda based solution would solve the generic programming problems, because it fully supports C++ templates. With templates you can also solve the permutation problem. If you don't want runtime branches in your shaders, use templates to make the decision once at call site (compile time). Same is true for all aspects of ubershaders. With templates (+ variadic templates) you can build shader bodies quite elegantly at compile time (only include the blending and discard operations if your shader needs them).

    Now the GPU compiler just needs to go though the C++ code, and separate the lambda bodies and resolve the compile time templates to see what kind of shader permutation needs to be compiled to each call site. The programmer can use the captured C++ structures (such as vectors) directly in their shader code. There is no need to think about GPU resource registers or things like that anymore. If you need to support discrete GPUs, you can add "Buffer<T>" and "Texture<T>" classes that can be captured to the lambda instead of direct shared data structures. On CPU side you could obviously access and modify the data inside these classes by standard bracket [] operators.

    GPU pipeline objects would have "dispatch" method to execute the lambda functions on the GPU (rendering pipeline would also have "draw" and "drawIndirect", etc). The developer could instantiate multiple pipeline objects (one of type GraphicsPipeline and N of type ComputePipeline to support multiple asynchronous compute pipelines running in parallel to the graphics pipeline).

    ---

    Example:

    Code:
    GraphicsPipeline graphicsPipeline;
    
    std::vector<MyVertex> myVertexBuffer;
    Texture<uint16_t, 4> myTexture;
    
    Matrix4x4 myMatrix;
    
    // TODO: Add code here to set matrix, VB and texture.
    
    auto vertexShader = [&](SV_VertexId idx)
    {
        OutStructVS output;
        auto myVertex = myVertexBuffer[idx];
        output.position = myMatrix.transform(myVertex.position);
        output.UV = myVertex.UV;
        return output;
    };
    
    auto pixelShader = [&](OutStructVS &input)
    {
        return myTexture.sample(input.UV);
    }
    
    PipelineObject myPipelineObject = PipelineVS_PS(vertexShader, pixelShader);
    
    graphicsPipeline.drawVertices(myPipelineObject, 100);
     
  4. keldor

    Newcomer

    Joined:
    Dec 22, 2011
    Messages:
    75
    Likes Received:
    113
    The vast majority of work in developing a compiler is going to come from writing the optimizer. Writing a simple parsing + codegen to a clean IL like LLVM (no register allocation!) takes a single developer a couple months. Writing a good optimizer takes a team years on ongoing development. With that in mind, it would be a big win to be able to share as much of the optimization code as possible not only between languages, but between different architectures. Some types of optimization are highly architecture agnostic, and some aren't. Separate these into generic optimizations, which can be shared as much as possible, and hardware specific ones, which are performed in the targeting optimization passes.

    DirectX is in bad need of a new IL, if only because the old one is still locked to 4-wide SIMD, which has been more or less abandoned since the early days of DX9. Even architectures still using explicit SIMD tend to have multiple vector widths for different word sizes (half vs float vs int8), which doesn't map to 4-wide SIMD either. Nor does VLIW. Another problem is that DX enforces 128-bit alignment for members (AFAIK), which is grossly inefficient for, say, scalar floats. Switching to something simple like scalar (let the IHVs do vector optimizations - they generally have to tear down the SIMD4 and repack things anyway) would probably go a long way to reducing compiler bugs and even improve performance in cases where the compiler can't tell if you're going to use say channel w in some future pass.

    Something derived from LLVM would be the logical choice for a universal IL simply because much of the industry is already using it! Just have to get Microsoft on board and everyone's life will be much easier.
     
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Just wanted to clarify one of the most important things about my suggestion. Because the shader code is tightly integrated to the C++ CPU side code, the compiler can optimize each call site shader permutation separately. If some parameters are const or constexpr (known or calculated at compile time), the shader body can be optimized accordingly. This way many values that would normally be pushed to a constant buffer and read by the shader from the memory can be directly inlined (saves constant buffer creation, data movement and constant loading costs). The compiler can also optimize the shader code by moving shader math to CB values to CPU side (similarly than a CPU only compiler moves repeated math outside the inner loops).
     
  6. jbarcz1

    Newcomer

    Joined:
    Jan 4, 2005
    Messages:
    7
    Likes Received:
    0
    Location:
    Baltimore, MD
    I don't expect it to happen in the near term, but it should be the long-term goal. I care less about a common IL than a common language.

    The problem today is that there are only two kinds of shader language:
    1. Single-platform, single syntax, high-quality compiler, offline and online compilation
    2. Cross-platform, numerous syntacies, varying compiler quality, online compilation only

    The common IR means there will eventually be a path that will lead to the holy grail, via the treacherous swamps of IR translation:
    3. Cross-platform, single syntax, high-quality compiler, offline and online compilation


    The reason for the griping is that the only way to build a source-to-source translator is to basically implement an entire frontend. We need to build a full AST, and in order to not explode on erroneous input, we'd need at least some syntax/semantic checking.

    There's an implicit assumption in all this excitement that the new, open IR will also bring a new, open compiler that will emit the IR. If we have that, and if its output can easily be converted to somebody else's input, then most of the work has been done. It will plug a gigantic hole in the current ecosystem.

    This depends on the IR. There's considerable room to improve on what D3D bytecode offers. The key problem with D3D IR is that you need an MS compiler on an MS platform to build your IR in the first place. Add to that the fact that the new bytecode is harder to translate (compared to SM3), and you have the source of our frustration. If it weren't for this, there wouldn't be so much complaining.

    I think the answer to the requirements question is "everything that LLVM can do". It needs to be strongly typed, and needs to be open enough that we can do whatever we need to with it (Disassemble it, generate it, transform it, concatenate it, translate it, reflect it). There needs to be code available to manipulate the IR to let us do these things. The IR itself needs to be compact, and amenable to fast translation into whatever form the driver compiler uses. Something like LLVM/SPIR is an obvious choice, but I'm fine with something different as long as the tools are there.
     
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    My thoughts on shader programming, exactly.
    We need system level programmability for this stff.
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Good to hear that other studios also have like minded developers.

    I have discussed with other graphics programmers and many are currently using LLVM/clang and other tools to parse their cpp graphics code and automatically generate headers to the HLSL side (for bindings), and also the other way around. Unfortunately there's no official HLSL grammar available, so our internal parser (of our new tech) for example doesn't yet support some of the less often used features of the language. I don't see a good reason why every game studio should hire people that have skills to build language translators. We happen to have people with this kind of software engineering background, but that's not common (at least among smaller developers).

    Fully C++11 standard compliant shader language is important because it allows us to use existing tools to develop our shaders. Modern IDEs have lots of features that boost programmer productivity, such as autocomplete, interface/comment peeking, jump to definition (IDE finds definitions also from the includes and opens those files automatically at the right line), mouse over to see variable types, underline with red to see syntax errors / missing identifiers. Writing code without these features is PAINFUL and slow.

    If just feels bad that we shader programmers still need to do "notepad programming (TM)" instead of having good development tools. This is not acceptable anymore, as our code bases are getting bigger and more complex. Compute shaders have allowed us to move most of our graphics engine to GPU (scene setup, resource management, culling, etc). Shaders are no longer small pieces of code that calculate a pixel color. We have big libraries of includes and code, and we need IDE / tool support to have all the same productivity boosting features as normal C++ programmers have. This is the main reason why I want to write my shaders with standard compliant C++ instead of some custom language that no existing IDE or tools support properly. C++ based shader language would have very good IDE and tool support out of the box. Immediately.

    Being able to call your C++ functions (and use your C++ templates) from the shader code is also important. Nowadays we need to write the same (math/helper) functions on both C++ and custom extended HLSL. This kind of copy-paste coding is not elegant and is not easy to maintain. Also interoperatibility with C++ (CPU) code would be much better (I already gave examples how much easier it is to set constant and buffers if both languages are compatible and can be inlined inside each other).

    Last but not least are the code refactoring and analysis tools. We have plenty of good tools for C++. Visual Assist is very popular productivity booster among Visual Studio programmers. ReSharper C++ is even better for code refactoring. If we had C++ based shader language we could also use tools like these to boost our productivity and maintainability. C++ analysis tools such as Coverity are also important for code maintenance and monitoring purposes. Good tools are important, and this is why I would prefer a commonly used language (such as C++) instead of a custom language with much less industry backing.
     
    #28 sebbbi, Sep 7, 2014
    Last edited by a moderator: Sep 7, 2014
  9. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,080
    Likes Received:
    997
    Location:
    Planet Earth.
    That's pretty much what we are already doing for Tasks when we use Cooperative Multi-Tasking (either with Cilk or conceptually), I didn't think to apply it to GPU, but with integrated taking more and more space in the market it makes a lot of sense, although discrete will always be my favorite given their high performance.

    That would spare us the glue setup code binding the buffers and textures to the pipeline...

    I'm scared the code might turn awfully wrong quickly though.


    P.S.
    It's pretty much was I was talking about with Chapel (except I express myself poorly because I'm very busy and working on way too many things at once.)
    [Also I didn't think it would be feasible for Mantle/D3D12/OGLNext, so a C (with generics) as the shading language and binding pointers/ids directly would be a good improvement over what we already have and seems to be a better match for what those API seems to be.]

    P.P.S.
    If anyone could explain to me how/why they get that hundreds/thousands of permutations I'd be quite interested, I worked on AAA titles too, and never needed a lot of permutations...
     
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    My approach would also work perfectly fine for discrete GPUs. The compiler would see the parameters captured in the vertex and pixel shared lambdas. Actually even the current C++ compilers need to pack the captured parameters to a function object. This already collects the captured "constants" to a struct that could be easily sent to the GPU similarly than constant buffers are currently handled.

    For persistent GPU data (like textures, vertex buffers and structured buffers), there would of course be C++ side container wrappers, such as Texture<T>, VertexBuffer<T> and StructuredBuffer<T> that have updateRegion() and map() methods. Direct indexing with brackets [] could also be supported (but of course it would be really slow without shared coherent memory).
    They must be using forward rendering :)

    But that isn't a problem for the system I proposed either, since you can just initialize an array of lambdas for all the permutations you need, and dynamically index to this array when you select which permutation to use.
     
  11. Tim Foley

    Newcomer

    Joined:
    Sep 2, 2014
    Messages:
    8
    Likes Received:
    0
    Location:
    The Abyss
    Completely agreed. I just want to make sure everybody is on the same page about what they think will be provided by the IHVs vs. the ISVs vs. other parties (e.g., Khronos, Micosoft, ...).

    From what I can see, there are really three key components here:

    1) Every new API should provide a shader IL. Ideally these would all be the same, but just being relatively similar would be a step up. At least one (next-gen OpenGL) will have a public spec, so it might be possible to rally around that one.

    2) Ideally, these ILs should either support a C/C++ compiler, or have an LLVM back-end (contributed to the mainline LLVM project) so that we can just use clang C/C++. A C/C++ subset on par with what Metal has is reasonable to expect from all APIs and IHVs.

    3) Higher-level languages, at the level of HLSL/GLSL/etc. should probably be left up to the ISVs, since they already have to build tools at this level anyway, and it is unlikely that a design-by-committee language coming from, e.g., Khronos is going to be a perfect fit for anybody.

    Is this view of the world reasonable?
     
  12. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,080
    Likes Received:
    997
    Location:
    Planet Earth.
    1) I'm not so sure we need an IL. It sounds more like we don't expect to be able to have a common language across API so we already consider having lost and ask for an IL instead...

    2) We definetly need tools for the language, compile, test, debug... with minimum hassle.

    3) IMO, that's only true if we assume we can't get a common shading language across API...


    All in all we'd be fine with a single, simple, elegant and fast API that runs across all OS/HW along with its shading language, unfortunately, unless OpenGL Next is spectacular (very unlikely given OpenGL 2.0 Pure and OpenGL 3.0 Long Peaks faillures), we aren't gonna get that, so we go for the second best option, that's a common shading language, but we anticipate that to fail too, so only ask for a common IL instead... that's really sad... :(

    In the future we want something like Chapel (yes, I find it tremendously better than C++**), or the C++11 based solution Sebbbi proposed, and a leightweight API...
     
  13. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    I'd like to see IL representation that's common to both GL and DX. Something like the binary representation in DX, but:
    - scalar instead of vector
    - with contextual information carried over from the higher level language (e.g. this op is done on channel Y of an input vertex) for better optimizations
    - data fetch clearly separated from computations
    - preshader recognized as a separate part (it'd be cool, I think, to have a mechanism to do the common part in CPU instead of GPU)
    - introspection into the shader so that developers can debug the thing
    - min/max/exact precision for operations (potentially with the ability to state "denorms are zero" for smallest mobile devices)

    With that I'd like to see a reference compiler to IL for GLSL (or whatever) and probably another (closed source?) from MS. On this level I'd like to see:
    - simple C-like syntax
    - composition/substitution mechanism, kinda like C macros but with type safety guarantees
    - sampler state and texture data decoupled (this needs support from IL as well)

    My 2c FWIW.
     
  14. Tim Foley

    Newcomer

    Joined:
    Sep 2, 2014
    Messages:
    8
    Likes Received:
    0
    Location:
    The Abyss
    A very loaded question...

    A lot of the feedback in this thread has been about wanting a "common" format, whether high-level language or IL, that could be used across platforms. I think this is a good goal, but also a difficult one for those of us working on these various APIs to make progress on.

    I'd like to ask a much more direct and actionable question, if people don't mind:

    Suppose that IHVs and ISVs in Khronos were going to provide an open-source shader compiler front-end as part of "next-generation OpenGL," but they only had the resources to develop one. Would you rather have:

    1) A clang/LLVM-based C/C++ front-end with similar meta-data syntax to the Metal shading language.

    2) A new revision of the GLSL shading language, with (backwards-incompatible) changes and additions as required to reflect whatever the resource and binding model of the new API is.

    And note that when I say "you only get one" I mean that Khronos would (in this hypothetical) only ever provide this one front-end. So you can pick GLSL because it makes your porting work easier in the near term, but then in the long terms if you want something else you would need to be ready to build that new front-end compiler yourself.
     
  15. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,080
    Likes Received:
    997
    Location:
    Planet Earth.
    1)

    Break retro-compatibility, that's the thing that keeps us in the past, we need to move forward and embrace what newer GPU (GCN, Kepler,...) can do.
     
  16. jbarcz1

    Newcomer

    Joined:
    Jan 4, 2005
    Messages:
    7
    Likes Received:
    0
    Location:
    Baltimore, MD
    C++ as it stands today is inadequate as a shader language. It would be less a subset and more a dialect. We need all of the additional syntax for vector types and swizzles (even though the IRs are scalar the vector syntax is immensely useful). We also need all the additional types for textures/samplers. So, probably you could use LLVM but you'd still need a different language.

    Also, I don't understand why you'd call GLSL high level but not C++. The only difference is the lack of pointers, and IMHO this is a good thing, because it avoids lots of aliasing problems for the compiler
     
  17. jbarcz1

    Newcomer

    Joined:
    Jan 4, 2005
    Messages:
    7
    Likes Received:
    0
    Location:
    Baltimore, MD
    I vote for 2. Trying to build an entirely new language, in addition to all the other work that needs to be done, seems very risky to me. I think the IR should be designed with an eye towards 1, but the high level language should be 2.

    With option 2, we're assured of having something robust in the near term, and we still leave the door open for other people to innovate on the language at a more gradual pace. I question the assumption that if we pick 2, we never get 1. Kronos could always form a parallel initiative to produce a new language in addition to maintaining the old.
     
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I think C++AMP has quite nice short vector types with swizzles (and so does our internal C++ vector library). With C++ templates it's easy to add new features with good interfaces (and with good runtime performance). You don't necessarily need direct language support for all the features, if the language allows you to build them on top of it.

    C++AMP style short vector types: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/01/25/concurrency-graphics-in-c-amp.aspx

    And some texture sampling examples: http://blogs.msdn.com/b/nativeconcurrency/archive/2013/07/18/texture-sampling-in-c-amp.aspx
     
  19. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    Why do you need those to be built-in types though? That's not exactly a scalable approach (the OpenCL thing with a billion vecN types is an abomination IMHO, but it's "harder" to build ADTs in C).
     
  20. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    I take it you haven't looked at Metal? The things you note are easily resolved there. See for instance:
    https://developer.apple.com/library...b.html#//apple_ref/doc/uid/TP40014364-CH5-SW1
    I agree with Alex that there is no reason for these things to be "built-in" types and "intrinsics" per se.

    I don't think pointer aliasing is a problem in shader-like kernels either (see for instance the restrictions Metal has on pointers) - all of the information is available to trivially resolve it, as this is what current shading languages already do by definition. In fact current languages are perhaps even less able to resolve aliasing in certain cases (i.e. they generally assume that any write through a given pointer/resource may alias any other write through the same one, even if types/structures are different).

    I'm definitely one who thinks that the existence of "restrict" and strict aliasing rules are an indication of poor language design, but having pointers as an address representation does not necessarily force you down that road. And let's be clear, once we move to bindless the situation is no better than pointers for the compiler anyways... a bindless handle is a pointer and in fact there is an additional level of totally opaque aliasing that can occur (indirection in the descriptor itself).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...