Hardware gaussian blur?

Discussion in 'Rendering Technology and APIs' started by Shifty Geezer, Mar 8, 2017.

  1. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,606
    Likes Received:
    11,033
    Location:
    Under my bridge
    Given the ubiquity of blur in various image operations, and the fact it doesn't map fabulously well to GPUs (even crazily optimised blurs need multiple taps) especially mobile, is there any merit and means by which some alternative sampling hardware could be implemented to facilitate random image sampling? I suppose passing the results to shaders could be problematic so perhaps they'd need to function as stand alone buffer processors. Or are GPUs as good as it's going to get and there's no room for improvement?
     
  2. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    As soon as I saw the words "random image sampling" the adjective "slow" came to mind.

    It's not dedicated HW and unfortunately I can't remember where I saw it, but think I I read about methods for constructing (approx) Gaussian filtering possibly either using a few box filters or perhaps IIR filters. Might be worth a search.

    As for 'new' filtering hardware, one recent one I can think of was the addition of bicubic Catmull-Rom to PVR GPUs
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,606
    Likes Received:
    11,033
    Location:
    Under my bridge
    Yeah, I've used the optimal, very clever solution. It's using the hardware texture sampling and using texture coordinates between buffer samples to get a weighted sample. http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/

    Problem is it's still not fast enough! For good sized blurs you need multiple passes even at quarter res and the like. At first glance there's not a lot that can be done about that because you need to sample the area of the blur. However, I'm sure something clever can be done. Just off the top of my head, how about creating a cascade of reduced sized buffers (mip maps) and using some clever sampling between them? I expect the caching of these operations is very efficient so there wouldn't be any massive gains having a large enough scratchpad (eDRAM) to fit the whole buffer.
     
  4. fuboi

    Newcomer

    Joined:
    Aug 6, 2011
    Messages:
    90
    Likes Received:
    45
    Reduce read latency by inverting the flow: instead of reading multiple samples per pixel, read the input once and write it weighted to all output samples. To reduce the massive write amplification use tiled atomic add on LDS/GDS. Would this work? Who knows.
     
  5. L. Scofield

    Veteran

    Joined:
    Mar 28, 2007
    Messages:
    2,559
    Likes Received:
    323
    Enter Kawase's Bloom (slide 44):

    http://www.daionet.gr.jp/~masa/archives/GDC2004/GDC2004_PIoHDRR_EN.ppt

    Unity's open sourced bloom works like that:

    https://github.com/keijiro/KinoBloom

    You could also do the blur throughout several frames, like SotC's / ZOE2's bloom:
     
    dogen likes this.
  6. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    8,456
    Likes Received:
    578
    Location:
    WI, USA
    Nvidia's DSR has that optional blur and it is a Gaussian filter.
     
  7. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    On modern GPUs, you should program blur kernels as compute shaders. Compute shader has access to groupshared memory, a fast on-chip memory per compute unit (64 KB per CU on AMD GPUs). With groupshared memory, you don't need to load/sample the blur neighborhood again and again for each pixel. Instead you first load the neighborhood to groupshared memory and then load data from groupshared memory for each pixel. Separate X/Y as usual.

    You should also do reductions directly in groupshared memory if you want multiple different radius gaussian filters. Doing multiple downsampling & combine pixel shader passes is slow, because the GPU stalls between each pass (as there's always a dependency to the last passes output). This is another important advantage of compute shader blur versus pixel shader blur.
     
    dogen and Shifty Geezer like this.
  8. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,606
    Likes Received:
    11,033
    Location:
    Under my bridge
    Cool. When we get compute on mobile, we can get some decent effects! So compute really does replace the need for pretty much any specialist hardware?
     
  9. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Compute shaders are still limited. The programming model could be more flexible. Also groupshared memory is mostly only good for regular (known) access patterns. If you do random sampling, then your neighbors don't share much data. Dedicated texture filtering hardware isn't going away anytime soon. Compute shaders are good for post-processing (2d rect, splittable to tiles, known neighborhood), but you need to be able to sample (and filter) a texture from any UV coordinate when you are rendering polygon meshes.
     
  10. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,606
    Likes Received:
    11,033
    Location:
    Under my bridge
    That is I suppose the broader concept. The main problem is memory access patterns. If we had unlimited bandwidth with no read/write latencies, we could do whatever we wanted. There's no real solution for that as we're limited to caches and access patterns.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...