NVIDIA Working on Tile-based Multi-GPU Rendering Technique Called CFR - Checkered Frame Rendering

Discussion in 'Architecture and Products' started by pharma, Nov 21, 2019.

  1. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,003
    Likes Received:
    1,687
    NVIDIA Working on Tile-based Multi-GPU Rendering Technique Called CFR - Checkered Frame Rendering
    November 21, 2019
    "Forum user Blair at 3DCenter had a sharp eye noticed an added entry towards the drivers for Muli-GPU rendering, the technique is called CFR and basically slices up a frame in many small pieces, in order for the GPUs the render them in a parallel manner.

    You could also refer to the technique as checkerboard rendering, where you split everything up into smaller tiles and have the GPUs render them based on an algorithm or simply, FIFO, first-in, fist-out, this could increase scaling performance but also helps with things like micro stuttering as frames and their output pacing are processed in way more stable manner. The basis is, of course, an existing technique applied in many solutions. NVIDIA, however, wants to use if for multi-GPU rendering.
    ...
    Since CFR is currently activated with the help of extra tools and/or requires some manual work at Tweaking. The results and entries that NVIDIA is actively working on this methodology. The new technique would be DirectX compatible only, and as it seems for Turning and upcoming based GPUs as it will require NVLink."

    [​IMG]
    https://www.guru3d.com/news-story/n...que-called-cfr-checkered-frame-rendering.html
     
  2. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    12,824
    Likes Received:
    9,177
    Location:
    Cleveland
    Copied post to a new thread for folowup discussions.
     
    pharma likes this.
  3. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    95
    Won't work for realtime rendering. Pixel N needs access to pixel Y in pass Z because it's screenspace tracing and oops it's on another GPU better stall the frame for a while data copies over, or worse as is suspected, wait while data is synced every bloody pass, how long is that going to take, GPUs are already highly latency sensitive.

    Nvidia has gotten the "too in the lead for too long" syndrome where they try things because they have the money to do so rather than having because it's a good idea. There's a reason multi-gpu support was dropped already.
     
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,837
    Likes Received:
    2,670
    It already works, some users have enabled it in games with no multi GPU support.
     
    pharma and PSman1700 like this.
  5. yuri

    Newcomer

    Joined:
    Jun 2, 2010
    Messages:
    184
    Likes Received:
    152
    Kaotik likes this.
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,257
    Likes Received:
    1,947
    Location:
    Finland
    Works and "works" are two different things though, we'd need thorough review of how it works, is there some artifacts because of it, performance anomalies and what not to make any conclusions on wether it really works or not
     
  7. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,204
    Likes Received:
    597
    Location:
    France
    Well, they can inovate and make it works. I'm not saying it's working, but it's not because the concept was problématic in the past than some clever dudes can't find solutions.
    We'll see.
     
    pharma likes this.
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,436
    Likes Received:
    443
    Location:
    New York
    Yeah seems like it would require a ton of inter GPU bandwidth and latency will be a problem. Also scaling will be limited due to redundant geometry processing.

    Maybe it’s just a proof of concept for a future MCM implementation. Can’t hate them for trying even if it’s because they have R&D dollars to burn.
     
  9. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    8,142
    Likes Received:
    6,406
    How much latency are we really looking at?
    DX12 supports explicit multi-adapter. Which also means it knows how to memory pool. We've seen mGPU operate very well on titles (Tomb Raider series) optimized in this way. Why is those cases is mGPU successful, but this CFR style will suffer all sorts of bottlenecks?
     
  10. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,095
    Likes Received:
    62
    Location:
    switzerland
    Games it currently works show a 40-50% fps increase using a 2x2080ti @4K

    https://www.forum-3dcenter.org/vbulletin/showpost.php?p=12144578&postcount=3586
     
    #10 tEd, Nov 22, 2019
    Last edited: Nov 22, 2019
  11. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,376
    Likes Received:
    249
    Location:
    NY
    Yes! Death to AFR!
     
    Lightman likes this.
  12. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    95
    Neat! Glad to be proven wrong. But there's the bandwidth and latency copying problems already. Do they force sync after each pass? It'll be interesting to see more details.

    I'm also surprised, and skeptic, that it'd work at all under DX12/Vulkan, specifically the list says Metro Exodus works under DX12, but unless you build drivers for each specific game it seems unlikely to work (maybe they did so for Exodus?). Still, looking forward to details on how they handled bandwidth/latency problem.
     
    #12 Frenetic Pony, Nov 22, 2019
    Last edited: Nov 22, 2019
  13. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    8,142
    Likes Received:
    6,406
    Just keep a copy of the memory on both GPUs. Why the need to copy back and forth?
     
  14. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    95
    Because then each GPU would have to do the exact same work for both copies to match up perfectly, making it pointless? That's not what they're doing. Half the frame is rendered on one GPU, half on the other based in some sort of tiled manner apparently. Does SSR just not work, are there obvious SSAO lines from missing info? How would you handle non graphics related work?

    The way this is described couldn't work for anything other than primary visibility without a lot of cross GPU data syncing, even then there'd be obvious artefacts. So either it's useless or they've figured out some frame sync calling to copy all data. Which in my head is screaming stalls, but the performance numbers look good. I'm really trying to figure what it is they're doing, the graphic they have doesn't seem related at all, that's just checkerboard temporal reconstruction. Guess it'll have to wait for a paper or some other explanation.
     
  15. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    95
    Ok, might've figured it out. The graphic might be correct, you could just dual checkerboard render on two GPUs, thus the "tiling" thing could be kind of misleading, as it's not totally important. One renders the "Frame N" pixels, the other "Frame N+1" pixels.
    [​IMG]
    You then only have to sync and resolve final frame output. Clever really, I feel dumb now. Things like SSAO and SSR will indeed be a bit glitchy, but with a decently high resolution it shouldn't be that noticeable. And the more GPUs you use the less you scale, same with the lower you set the resolution, as frame setup will start dominating for both. Still, overall a good solution for most anyone that would buy two GPUs to begin with.

    Definitely something AMD and Intel can replicate with some effort, as well as on APIs like DX12/Vulkan if the developer supports it. EG highly likely to show up in UE4/Unity, as the first already sells to pre-viz VFX and the second one wants to. And hey if that's not what Nvidia's doing, from an initial impression it could work anyway.
     
  16. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    354
    Likes Received:
    304
    Why should cooperative work in large tiles on the same frame be implausible? It's not as if NVlink didn't at least provide the necessary bandwith to compete eye to eye with a local memory access. Well, at least it's only a factor 2 behind, but full duplex in return.

    Well, yes, as this was published with a graphic, it does appear possible that the paired GPUs are actually performing driver side TAA.

    Eventhough it's not quite clear which framerate was actually measured then. +50% in terms of performed present calls (and then cut in half by driver side TAA recombination before display)? As the other possible number of a +125% boost per GPU just by effectively halving shading rate doesn't appear plausible. Well, for the first option not necessarily cut in half, as frame N+1 can be combined with both N and N+2, not as limited as with a classic interlaced video steam.

    Not sure what they are doing internally. Hijacking multisampling with a programmable pattern and effectively lower res targets, or clever use of variable rate shading to avoid interfering with data layout?
     
    Frenetic Pony likes this.
  17. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    347
    Likes Received:
    95
    My concern wasn't bandwidth necessarily, but latency. Anything over a link like Nvlink tends to be far slower than local access, as in microsecond versus nanosecond access. Just a lot of time to wait for whatever stalls crop up. Though I suppose if Nvidia carefull built a driver profile for each and every title enabled those could be somewhat minimized.

    Yeah I don't know what the performance metrics are. Maybe the framerate was just "125%" of normal?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...