D
Deleted member 13524
Guest
Yes, but will it run crysis?
EDIT: It will!
EDIT: It will!
This is also my current mobile lighting pipeline design. During my career I have implemented pretty much every single variation of deferred rendering. I suppose it would be faster to discard (or branch out) pixels outside light's influence in the light pixel shader (before doing the lighting math) instead of using stencil to mark pixels that are inside the light volume (stencil mark pixels that fail depth of the light front faces, render light back faces with dpeth fail)? Without stencil you can pick only one culling direction (render light back faces with depth fail or light front faces with depth pass). Both have their good and bad cases.Sebbi, I've done some analysis of this recently, and I'm quite confident that no matter what clever tricks anyone comes up with, tiled deferred lighting is nearly certainly NOT the optimal solution for either PowerVR or ARM. It is much better for us to use lighly tesselated geometry (~50 vertices/triangles per sphere or cone maximum) with Pixel Local Storage (or simply discarding the outputs at the end of the pass in Metal).
That is an interesting question. GFXBenchmark 3.x / Manhattan does per-pixel discard to optimise the lighting equations, and so every major mobile GPU designer had to optimise for this case in the last year (or was lucky to already be optimsed for it). So it's a safe bet that it's not going to hit an ultra-slow-path on any modern mobile architecture, and is likely to only get faster in the future. Certainly for some architectures it was significantly faster to branch ~2 years ago, but this has meant discard should be as fast now (note that multiple discards per shader are handled differently by different architecture and/or compilers, and you may or may not be saving the work in-between the first and last discard).I suppose it would be faster to discard (or branch out) pixels outside light's influence in the light pixel shader (before doing the lighting math)
My intuition is that double-sided stencil is probably faster than discard/branch on every shipping PowerVR GPU. However I suppose this is slightly dependent on the position and especially size of the lights in your scene; if most lights are very small and positioned close to walls, one-sided depth tests are likely to be nearly identical to two-sided tests, then it may turn out that not doing either stencil or discard/branch is slightly faster. Alternatively, you could do a hybrid approach where e.g. larger lights use stencil tests and others don't.instead of using stencil to mark pixels that are inside the light volume (stencil mark pixels that fail depth of the light front faces, render light back faces with dpeth fail)? Without stencil you can pick only one culling direction (render light back faces with depth fail or light front faces with depth pass). Both have their good and bad cases.
Definitely it should be a good idea to switch the approach based on light screen space radius (and other factors). For lights that overlap the camera position, stencil is useless (and wastes lot of fill rate). Just rendering the light backfaces with depth fail is optimal in this case. Tessellation would also be benefical for light rendering. You could generate the light mesh by the tessellator (saving bw) and you could reduce the triangle counts for light geometry based on camera distance. This would produce both better quad efficiency (for distant lights) and less wasted area (for near lights).My intuition is that double-sided stencil is probably faster than discard/branch on every shipping PowerVR GPU. However I suppose this is slightly dependent on the position and especially size of the lights in your scene; if most lights are very small and positioned close to walls, one-sided depth tests are likely to be nearly identical to two-sided tests, then it may turn out that not doing either stencil or discard/branch is slightly faster. Alternatively, you could do a hybrid approach where e.g. larger lights use stencil tests and others don't.
More information is always better. Especially for PC and mobile platforms. Console generations last so long that developers have the intentive and time to find the optimal solution for their needs. But developers can't afford to spend as much time for each different PC and mobile GPU (unfortunately).Honestly, it feels like this is the kind of thing that GPU designers themselves should analyse, and provide easy-to-understand SDK samples *with* performance data for different scene types so that developers know what high-level algorithms are recommended for their architecture without having to waste weeks (or even months) of time testing out different alternatives, but without the necessary low-level knowledge to know exactly how the technique should be implemented for optimal performance, thus making for an unfair comparison. Alternatively, they should provide the optimised paths for common engines like Unity and Unreal themselves, since these engines already do GPU detection to some extent. Of course, this is effectively what NVIDIA already does by just giving free engineers to AAA games in the TWIMTBP program, but only they can afford this and it only benefits a small part of the developer community for a single GPU architecture...
Tada :Nothing to write home about; I just ran across this dwarf-a-licious screenshot and assume it's been crafted for Rogue GPUs:
https://twitter.com/TobskiHectov/status/629315223686631425/photo/1
Predictably,Mantle 2.0Vulkan seems to be doing wonders on smartphone SoCs.
My gut feeling tells me that Vulkan's added efficiency won't be limited to just ULP mobile SoCs.
So I wonder who will be able to afford Vulkan development. Big AAA titles on consoles will probably benefit (but the APIs on consoles were already low level). Big AAA titles on PC probably won't benefit, as you mention.Of course not, but low-power (and consequently lower-performing) multi-core CPUs will benefit more.
On the x86 desktop, the gaming industry evolved towards smaller amounts of very high-clocked CPU cores carrying much higher IPC.
DX12 and Vulkan will probably bring a huge boost to AMD 4-8 core solutions, but not much to the latest Core i5 and i7.
(In my opinion) Vulkan code is much more readable and less complex than OpenGL. Vulkan API is super clean. OpenGL on mobiles also has unbelievable amount of CPU overhead, making Vulkan even more attractible. If (when) Vulkan supports enough existing hardware configurations, it will become very popular.So that leaves mobile devices, where Vulkan could have the largest impact. However, budgets for mobile game development are small, and Vulkan code is more complex, since many things must be done manually that were done by the driver.
I'm wondering how many titles will actually use Vulkan. It seems like it's in a difficult place due to market forces.
Of course not, but low-power (and consequently lower-performing) multi-core CPUs will benefit more.
On the x86 desktop, the gaming industry evolved towards smaller amounts of very high-clocked CPU cores carrying much higher IPC.
DX12 and Vulkan will probably bring a huge boost to AMD 4-8 core solutions, but not much to the latest Core i5 and i7.