Forward+

And with thousands of lights, a single tile can get pretty expensive, especially given the parallelism needed to feed modern GPUs. With 3D clusters there are no pathological view points, and work size is closely related to light density.
Sure although I think you're overstating that there are "no pathological cases" with clustering. Any conservative culling system will still be primarily bound on the pixel that has the most lights. Unless you actually start approximating illumination with the clusters or are willing to reschedule the entire workload (work stealing might work here, but limited somewhat by GPU hardware), you still have to evaluate every light that affects that pixel serially.

But yes, I agree in general that clustering is more robust to worst case performance. That said, I'm not totally convinced that it would matter a lot in practice compared to something simpler.

I'm not aware, however, that you can control branching on that level in glsl, or CUDA (as you can in hlsl), so results might vary in practice.
Oh a shader compiler is always still free to just flatten the branch if it wants so there are no guarantees, but in practice I've found it to be fairly reliable in HLSL at least.
 
That game seems terribly optimized. A GTX680 can barely average 60fps at 1080p with 4xMSAA.

I know it's AMD sponsored, but the 7970 doesn't exactly blow the roof off either. Can anyone tell me if the game looks that good?

The latest update should improve the performance for the rendering modes tested by PCGH, though they have now also added a Global Illumination option for further IQ goodies.
 
The latest update should improve the performance for the rendering modes tested by PCGH, though they have now also added a Global Illumination option for further IQ goodies.

It does - the games now runs very smoothly on a higher-end Radeon, even with GI enabled, that previously killed performance somewhat. It's not free, though.
 
Doesn't want to work on this 5770. :p (Had to move house.obj & .mtl into the x64 folder first btw)
 
Doesn't want to work on this 5770. :p (Had to move house.obj & .mtl into the x64 folder first btw)

Ah, yes, as I said, there are problems with the Radeons, unless you got one old enough. Myself I have an integrated 3200 series, on which the software fallback works. It can be forced by setting 'g_enableGpuClustering' to false in the source and rebuilding. But then the clustering is done on CPU.

You should not need to copy any files, just run using the 'run.cmd' file. As per the ReadMe.txt ;)

Thanks for the feedback, if you (or anyone else) happen to figure out why it doesnt want to go on AMD hardware, that would be great. I have no high end Radeon to debug on myself.
 
Neat demos Dave, thanks for posting! It's also refreshing to see them be standard DX11 without me having to go hack out DeviceID or similar checks from the source :)
 
Anyone have a Forward+ description for dummies ?

It's similar to tile based deferred rendering, however instead of computing the tile bounds from a G-buffer, you use a regular depth prepass. Then a second scene pass does the forward lighting inside the pixel shaders.
There are some advantages and disadvantages. Andrew Lauritzen had a great presentation at Siggraph last year comparing both methods - http://bps12.idav.ucdavis.edu/talks/03_lauritzenIntersectingLights_bps2012.pdf
 
Since this thread appears still alive, I'll jump in and mention that I too have put up a demo, for Clustered Forward Shading (which supports transparency). And it's OpenGl, which helps if you are not on windows vista or up, though there are issues on AMD hardware for some reason (apologies).

http://www.cse.chalmers.se/~olaolss...s=publication&id=tiled_clustered_forward_talk

Works ok on my 680 but for some reason the frame rate takes a big hit if I turn on the profiler info display. Strange.
 
Works ok on my 680 but for some reason the frame rate takes a big hit if I turn on the profiler info display. Strange.

The drop can be explained by 'glutBitmapCharacter', which is used to print all that text (I verified this by commenting out the line and measure frame rate with FRAPS). So yes, you cannot use the wall-to-wall frame rate while looking at the measurements. However, the frame time in ms is shown in the profile.

I'd also like to point out that the wall-to-wall frame rate is not a great measure in this implementation anyway. This is because my performance timers force synchronization all the time. I've got support for asynchronous timer queries in there, but they always come out funny when mixing OpenGl and CUDA.

So, for example, whlie I was checking now, the frame time was ~6ms = ~166 fps, while FRAPS reported ~70 FPS. Lots of stalling, presumably. There are also other sync points in the various implementation paths, e.g. when depth data is copied back from the GPU.

Anyway, it's just a demo, not exactly ready for production. The detailed measures are there to give you an idea of relative costs of the different steps.

Btw, the demo also supports Tiled Forward Shading, with transparency. This code path ought to work on all hardware... (Tiled Forward Shading is the same thing as what AMD folks call Forward+, just to avoid any confusion on that account).

Cheers
 
Look to run on mine 7970's ...or i miss something ? ( 12.11beta, all setting in CCC ( by appliccations ).


Edit.... Humm something strange happend when i cycle view after change some setting ( color all over the places, ) but not all the time.
 
Last edited by a moderator:
Look to run on mine 7970's ...or i miss something ? ( 12.11beta, all setting in CCC ( by appliccations ).
...

Edit.... Humm something strange happend when i cycle view after change some setting ( color all over the places, ) but not all the time.

For some it has worked for the first frame only, that is if you move the camera around it gets increasingly wrong. Is this so for you? Also, reported behaviour seems to be quite different between driver versions.

Anyway, I'm not sure it is polite to turn this into a support forum for my demo. However, please do email me directly (email on my home page) with any questions / reports.
 
Anyway, I'm not sure it is polite to turn this into a support forum for my demo.

It's more than fine; I'm interested to know the technical reason on why it doesn't work with amd gpu's (I assume we'll never know other than "a driver bug", but I'm still curious).
 
For some it has worked for the first frame only, that is if you move the camera around it gets increasingly wrong. Is this so for you? Also, reported behaviour seems to be quite different between driver versions.

Anyway, I'm not sure it is polite to turn this into a support forum for my demo. However, please do email me directly (email on my home page) with any questions / reports.

No problem when i move camera or switch scene. no problems now, it was when i was playing a long time with some setting .

Maybe some screens with the grid could help ( dont look like the one's in your presentation ) ( 8xMSAA was maybe forced on catalyst ) (still 12.11 beta11 - 2x HD7970 ( but like the demo is windowed only one is working ofc)




Anyway one "bubbles" look a bit strange.
 
Last edited by a moderator:
No problem when i move camera or switch scene. no problems now, it was when i was playing a long time with some setting .

Maybe some screens with the grid could help ( dont look like the one's in your presentation ) ( 8xMSAA was maybe forced on catalyst ) (still 12.11 beta11 - 2x HD7970 ( but like the demo is windowed only one is working ofc)

Anyway one "bubbles" look a bit strange.

Sorry for the slow reply. Well, the clusters look ok. Much like what I get. You need to also move the camera a bit sideways, or something, to see the clusters in all their 3D glory (between you first and second shot).

Unfortunately, I cannot work out what is going on with the shading on those bubbles. I've seen this before, also from a 7000 series AMD card. The thing is that if the clustering was at fault I'd expect more blockiness.

The problem is fairly isolated, since all appear to work fine if GPU clustering is turned off. This is the case on my AMD Radeon HD 3200 at home, where GLEW_EXT_shader_image_load_store is not present.

In any case. I went through the very minimal changes to move to ARB instead of EXT, which is the the same as core OpenGl 4.3. Maybe this will change things? I also found 2 unrelated bugs when testing the change. If anyone want to give this new version a spin, that would be most welcome.
 
Back
Top