Speed: assembly vs. HLSL shaders

A magazine I write for just did an interview with the devs behind STALKER and one of the answers I just got back was very surprising:

7. Have your development team experimented with writing shaders in HLSL? How much development time can be cut by using HLSL in the long run? What is the average performance penalty for using HLSL vs. assembly shaders?

Yes, All DX9 render shaders are written with HLSL.
HLSL gave better results in terms of code optimization.
Compared to assembler it is way faster However there were occasional mistakes in compilation.

I was under the impression that assembly would be much faster? How could this be?

And also this may interest you:
1.How is STALKER’s shadow system implemented? Is it unified? How does it compare to Doom 3?

With DirectX 9 renderer we implement fully real-time dynamic lighting, soft (unlike Doom3), physically correct shadows (again unlike Doom3), cast by every object onto every object, along with true per-pixel lighting (Doom3 uses per-texel), 1-3 millions of polygons per frame, etc.

I don't understand why Doom3's shadows wouldn't be 'physically correct'. What could he be referring to?

Thanks. :)
 
HLSL code compiles down to assembly shader code, just like C compiles down to assembly. As such they are potentially the same speed. I say "potentially" because it all depends on the quality of the compiler versus the skill of the assembler coder. I've heard cases where HLSL was actually able to produce better code than manually written assembler.
 
It just proves that hardcore asm coders are dying species. Probably people just go with straightforward routines, without thinking too much about the penalties or chip architecture. Or maybe compilers "optimize" the code too much, cut the precision or something, I'm not an expert, but results of shader code are not really that important - it's not a database, so you don't have to be exact or keep all the framebuffer information as it should be...
 
JF_Aidan_Pryde said:
I don't understand why Doom3's shadows wouldn't be 'physically correct'. What could he be referring to?
Doom 3 uses stencil shadowing with shadow volumes. From a certain point of view, that's not a correct way to model it because only light is a physical thing. Also you need several tricks to make it work correctly with multiple lights, attenuation, intersecting shadows, camera in the shadow, etc. Although Carmack solved all of these problems, it's not an elegant way to do it. The only reasons to go trough all this trouble is because it's relatively fast for simple geometry on current hardware and you don't have precision issues.

A much more physically correct way of representing light and shadow is a shadow map. It holds the distances of how far the light rays reach from the light source. So, again from the right point of view, it models the volume where there is light. A single transformation and texture lookup tells you wether a point is in light or not. But shadow mapping also has its problems. The time it takes to generate the shadow maps is proportional to the scene complexity and not the model complexity, so it generally takes longer. Omnidirectional lights are also a problem because there's no really efficient way to map a sphere to a 2D rectangle. And last but not least you get artifacts because of limited shadow map precision.

So, theoretically shadow mapping is superiour, but hardware limitations make stencil shadowing more practical on the current and previous generation of hardware. Good shadow mapping requires DirectX 9 support so it's only for a limited range of users.
 
The time it takes to generate the shadow maps is proportional to the scene complexity and not the model complexity, so it generally takes longer.

I've never done any of such things in practice and excuse my therefore naive questions, but
- shadow map is a kind of a texture?
- it is generated during content creation, not in real-time?
- about how much time it takes to generate a shadow map for "Stalker-type" scene (as seen from screenshots)? Minutes? Hours?
 
I'm with PrzemKo. Nobody can program ASM any more - if they could, there would be at worst a small delta between ASM and HLSL.

However, there are other reasons too. The single-basic-block type of code that shaders currently implement is the most amenable to mechanical optimisation, and it's highly possible that even good CPU assembler programmers might make what mistakes that inhibit low-level optimisation in shaders.
 
Another issue with hand written shader assembly is that you have to make optimizations on that level are mostly based on assumptions you make about the hardware. I.e. while handwritten / optimized assembly might spare an instruction slot or two, this doesn't imply that it executes faster all the times (or at all) as especially graphics hardware is sparsely documented these times. Another large advantage is that HLSL code can get even faster with each driver (compiler) release and can be specifcally optimized for the users' architecture.
 
SvP said:
- shadow map is a kind of a texture?
- it is generated during content creation, not in real-time?
- about how much time it takes to generate a shadow map for "Stalker-type" scene (as seen from screenshots)? Minutes? Hours?
Very good questions!

- Yes a shadow map is a kind of texture. Imagine a rectangular spot light. Now render the scene from the spot light position into the rectangle, but only the z values. What you get is an image of what the light 'sees', or better, how far every light ray reaches. This defines a volume which the light illuminates. Then you render the scene as normal from the camera point, and for every pixel you determine if it's in the spotlight's light volume. This is a relatively simple transformation that compares the distance from the pixel to the spot light with the corresponding z-value in the shadow map. If it's less, then it's in the light volume and the pixel is lighted. But this 'less/greater than' stuff is causing the precision issues.
- For static lights it can be generated during content creation. So unlike shadow volumes you don't waste fillrate with that, you just have to do a transformation and texture lookup. For dynamic lights you have to render the scene from the light position every frame. But only the z-values are required and shadow maps can be relatively low-res, whereas stencil shadowing works in the screen resolution. Unfortunately for omnidirectional lights you have to render several shadow maps for every direction (six for cubemap, two for parabolical map).
- I haven't seen stalker screenshots yet, but technically, generating shadow maps takes much less time than rendering a frame (lower resolution, no texture mapping, no multipass, early-z/z-pyramid). So it's not minutes or hours but milliseconds.

So, besides the precision issues and the problems with omnidirectional lights, shadow mapping is really the 'physically correct' way of doing shadows. A worst case scenario for shadow volumes, prison bars, is not an issue for shadow mapping.
 
HLSL use the full instruction set. That's why manually written assembly has not much advantages. And if you manually shedule instructions, you would use the same algorithms as the compiler. But like PiNkY pointed out, architecture documentation is very sparse which gives the assembly programmer a big disadvantage. HLSL are also compiled at run-time, so they can adapt to the specific architecture they run on, while assembly is static. Althought the driver probably could optimize the assembly further too, I guess they just don't want to invest much time in it any more because HLSL is clearly preferred by game developers.
 
Nick said:
Doom 3 uses stencil shadowing with shadow volumes. From a certain point of view, that's not a correct way to model it because only light is a physical thing. Also you need several tricks to make it work correctly with multiple lights, attenuation, intersecting shadows, camera in the shadow, etc. Although Carmack solved all of these problems, it's not an elegant way to do it. The only reasons to go trough all this trouble is because it's relatively fast for simple geometry on current hardware and you don't have precision issues.
From what I've read, Carmack solved the "camera in shadow" problem simply by modifying the algorithm so that it wasn't a problem (i.e. "camera in shadow" isn't a special case: it's handled quite elegantly). And I really do not see why stencil shadows have any of the other problems you've highlighted. The only major problem I am aware of with stencil shadows is performance.

Here's a link to a description of how Carmack developed his shadowing technique for DOOM 3:
http://developer.nvidia.com/docs/IO/2585/ATT/CarmackOnShadowVolumes.txt
 
Nick said:
So, besides the precision issues and the problems with omnidirectional lights, shadow mapping is really the 'physically correct' way of doing shadows. A worst case scenario for shadow volumes, prison bars, is not an issue for shadow mapping.
I'm not sure why shadow maps should have problems with omnidirectional lights. Wouldn't you just render to a cube map instead of a normal texture? (you do mean, for example, a light that illuminates all sides of an interior room, right?)

But why would shadow maps be more "physically correct" than shadow volumes? Both are simply ways of telling the hardware whether a surface is, for a particular source, in light or in shadow. All of the actual lighting is separate from the shadowing technique used.
 
PrzemKo said:
Or maybe compilers "optimize" the code too much, cut the precision or something
I doubt it. The HLSL compiler is maintained by Microsoft. Microsoft has included a partial precision hint in the assembly, and there are data types in HLSL that will cause the partial precision hint to be used. It would be counterproductive for Microsoft to make their own compiler not produce what the source code requested.
 
Nick said:
HLSL are also compiled at run-time, so they can adapt to the specific architecture they run on, while assembly is static.
I don't believe that is a correct statement.
 
RussSchultz said:
Nick said:
HLSL are also compiled at run-time, so they can adapt to the specific architecture they run on, while assembly is static.
I don't believe that is a correct statement.

Yes, only the GL shading language is compiled by the driver itself, others use outside components to compile to assembly level language first.

D3D HLSL compiler is in a static library (D3DX) so you can't update it without recompiling and considering the low number of supported profiles at the moment I'd think most people would include the assembly versions instead of the HLSL compiler in the binary as you can check they are ok.

Cg compiler is available in both dynamic and static libraries so it is possible in theory to update just the dll, but considering how differerent the language profiles of Cg are, being forward compatible seems to be pretty hard. So same comments as D3D HLSL apply here too.
 
Nick said:
Doom 3 uses stencil shadowing with shadow volumes. From a certain point of view, that's not a correct way to model it because only light is a physical thing. Also you need several tricks to make it work correctly with multiple lights, attenuation, intersecting shadows, camera in the shadow, etc. Although Carmack solved all of these problems, it's not an elegant way to do it. The only reasons to go trough all this trouble is because it's relatively fast for simple geometry on current hardware and you don't have precision issues.

/bolding on my part/

actually that's quite a significant advantage, IMHO :)

apropos, i believe we can divide the shadow-volumes algorithm in two rather distinctive parts, which can be considered independently from each other:

* shadow volume generation
* shadow "casting" from the shadow volume

whereas the shadow volume generation is a rather esoteric, thus highly tricky problem, the shadow casting part of the business is very nice, properly precised, and mightly cool, i may add.
 
Chalnoth said:
From what I've read, Carmack solved the "camera in shadow" problem simply by modifying the algorithm so that it wasn't a problem (i.e. "camera in shadow" isn't a special case: it's handled quite elegantly). And I really do not see why stencil shadows have any of the other problems you've highlighted. The only major problem I am aware of with stencil shadows is performance.
Yes like I said Carmack solved all problems. The camera in shadow solution is even called 'Carmack's Reverse'. But it requires 'capping' of the shadow volume so it is closed. So, it all works perfectly, but all this extra work isn't exactly efficient nor elegant. The principle of shadow maps is just much more fundamental and has none of these issues.

Multiple lights require multiple passes because there is only one stencil buffer. But this also means that many parameters have to be recalculated per pass, like attenuation. With shadow mapping and advanced shaders you can do it in one pass, without doing the same calculations twice. And the distance to the light you have to calculate is also used for the shadow mapping so you even reuse calculations. That's what I call elegant.
 
Chalnoth said:
I'm not sure why shadow maps should have problems with omnidirectional lights. Wouldn't you just render to a cube map instead of a normal texture? (you do mean, for example, a light that illuminates all sides of an interior room, right?)
Onmidirectional lights are just a lot less efficient. Cube map sampling is, as far as I know, less efficient than sampling a simple texture. And for dynamic lights you would have to render the scene six times, once for every side of the cube. Fillrate is no problem because it only need z-values, but it's a lot of extra transformations, clipping and triangle setup. Of course shadow volumes also send extra polygons trough the pipeline, but not the whole scene. So you better have good occlusion algorithms. But, like I said before, the complexity of this is linear with scene complexity, so performance drops by a constant factor. With shadow volumes, the worst case with grid-like structures create some serious overdraw which is not linear with scene complexity and depends a lot on the position and viewing direction so it's not constant.
But why would shadow maps be more "physically correct" than shadow volumes? Both are simply ways of telling the hardware whether a surface is, for a particular source, in light or in shadow. All of the actual lighting is separate from the shadowing technique used.
Well you're taking a huge detour with stenciling. You even have to explicitely say what objects are shadow casters. There is no API command to switch on shadow volumes and everything is done efficiently by the driver. For shadow mapping, this could be possible if the driver drew every polygon to every shadow map. With deferred rendering I think that could work. Wouldn't it be nice that if you enabled lights in OpenGL and DirectX, they automatically cast shadows?

Actually I hate saying "casting shadows". Shadow is just 'not light'. Actually it should be natural that if you swith on a light in your scene, only the visible parts from the light's point of view should illuminate (apart from radiosity). That's exactly how shadow mapping does it. And I actually want to propose to call it light volumes, because they can be interpreted as a volume where there is light. In every point of that volume you can actually 'measure' the light. With stencil shadowing you take a huge detour to determine if a point is in light.

I hope you agree that shadow volumes actually model the shadow, which is a bit of a wrong approach because shadow isn't something physical like light. That doesn't take away that the output is correct. There are so many things that are 'physically incorrect' that produce exactly what we want, so don't take it as something that is necessarily negative.
 
RussSchultz said:
I don't believe that is a correct statement.
Well then the people in another thread on this forum have been lying to me. ;) I have been told that shader assembly corresponds directly with the instructions used by the GPU. So if you write an assembly program, that's exaclty what the GPU will execute. Thus it's "static", even though the API and/or the driver still have to parse the code and translate it to machine code at run-time. With HLSL on the other hand, the API and/or driver can still do optimizations that are specific for the architecture it runs on because it's run-time compiled.

But if I'm wrong then please enlighten me!
 
As JPAANA stated, the only HLSL that is runtime is OpenGL GSLANG and Cg. Dx9 HLSL compiler, while called at runtime, is statically linked into the application, therefore the output is essentially static. The IHV has no ability to insert their own compiler, and the driver has no ability to optimize based on the HLSL. Its required to work off of the DX9 shader assembly.
 
Back
Top