Nvidia Turing Speculation thread [2018]

Discussion in 'Architecture and Products' started by Voxilla, Apr 22, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. ShaidarHaran

    ShaidarHaran hardware monkey Veteran

    My hope for NVlink is that NV will offer an operational mode that pools VRAM without attempting to split the workload. Essentially, one GPU would perform all the calculations of a given workload with the other sitting idle or perhaps only assisting trivially. As I said, I don't need more performance necessarily (though I won't say no to it of course), I need about the level of performance which is already available via say GP102 but with more VRAM.
     
  2. silent_guy

    silent_guy Veteran Subscriber

    The reason you need BW between GPUs to work on the same frame is to give GPU B access to data that was generated by GPU A and vice versa.

    You don’t need it for other data, such as non-rendered textures, Z buffer etc.

    It’s clear that these shared resources are only a fraction of the total memory BW. I don’t know how much, but it seems very unlikely that they are 50%. So that number seems very wrong to me.

    Are they less than 8% (50GBps/~600GBps) ? Probably not. But even if they are more than that, you could still have meaningful performance increases.

    It’s also very game dependent, of course.

    That said: if NVLink and tensor cores work in the GeForce GPUs the way they work on Tesla class GPUs, then deep learning people will try gobble all of them up anyway, so it’s all academic. :)
     
  3. silent_guy

    silent_guy Veteran Subscriber

    For that kind of use case, the NVLink BW is almost certainly way too low. So don’t count on it.
     
  4. ShaidarHaran

    ShaidarHaran hardware monkey Veteran

    Not necessarily. My particular use case requires a large texture cache for textures that would not be loaded/unloaded frequently.
     
  5. Communism

    Communism Newcomer

    That would be true if you are doing old school rendering.

    If you are doing deferred rendering, you have to synchronize all the intermediate steps, which means an absolute crapton of bandwidth needed.

    I'm sure there's not that many deep learning students to make an appreciable dent in RTX 2080 Ti supplies anyways, and serious deep learning people will need/want the extra VRAM that the Quadros have.

    But yeah, there's every reason to think Nvidia would lock down/artificially slow down anything to do with deep learning / hpc outside of pointless epeen benchmarks and directly gaming related stuff in their Geforce drivers.
     
  6. CSI PC

    CSI PC Veteran

    Remember though NVLink2 is a more cohesive cache design and makes them actually closer to being 1 GPU in design.
    So it is a bit limited with 100GB/s (both links can be used to one GPU) but gains by having slightly better latency than SLI and importantly better communication/integration between the GPUs for developers/engines, while also better BW than SLI or even PCIe x16gen4 when using both Nvidia bricks in a dual GPU setup; NVLink is designed to be flexible so it would be fine but yeah still has limitations.

    Edit:
    I would need to check, but I thought real world application comparison between NVLink and SLI was about 25% gains, but big caveat this is not gaming and not sure that was the latest NVLink iteration.
     
    Last edited: Aug 18, 2018
  7. Communism

    Communism Newcomer

    I'm not sure you understand the gigantic gulf in bandwidth requirement between AFR 2 with 2+ frame latency and working on a single frame with no artificially added latency.

    One of these literally requires not much more than enough bandwidth to send over the finished frame in time, while the other requires both GPUs to synchronize every single intermediate render state.
     
  8. Rootax

    Rootax Veteran


    Of course I don't know your precise need, but, in that particular case, why not a Vega FE with 16gb of ram or even a Vega Pro SSG ?
     
  9. CSI PC

    CSI PC Veteran

    Comes down to how well engines/API (more so DX12/Vulkan) can integrate with NVLink/unified-cohesive cache in Multi-GPU setup and could be different to SLI or what we see with AFR.
    TBH I need to see how UE4 has integrated NVLink.

    Edit:
    While not exactly the same worth noting several of the UE4 demos have used NVLink when driven by multiple GPUs for their more impressive real-time demonstrations.
    The real-time Star Wars Ray Tracing demo was splitting work between 4 V100 GPUs (each doing different tasks) using NVLink, but that is just one example.
     
    Last edited: Aug 18, 2018
  10. Communism

    Communism Newcomer

    4 frames of latency obviously doesn't matter for when you are watching a video render (aka, what that showcase is).

    In real games every single frame of latency matters.

    Let's just agree that you don't entirely understand the situation in this particular scenario that i was talking to silent_guy about so we don't have to talk in circles ok?
     
  11. CSI PC

    CSI PC Veteran

    I know which is why I said it was one example, but the point it is a different presentation when compared to SLI/AFR.
    Does it have to be AFR/work like SLI?
    Answer is no, but it comes down to how well the API/engines can utilise NVLink and splitting tasks without it being traditional AFR.

    Edit:
    Just to clarify I am not talking about supporting older games but a few current and future games.
     
    Last edited: Aug 18, 2018
  12. ShaidarHaran

    ShaidarHaran hardware monkey Veteran

    Radeon SSG was a very intriguing product when I first heard of it. Unfortunately, the price is too high and the performance isn't quite where I need it to be.

    The particular use case I have in mind involves lots of triangles and many textures (up to 4k in size) spreading out over a view distance of perhaps 100 miles or more. Much of the detail is lost to LOD, but there are still quality and performance gains to be had by keeping all the textures in VRAM and not needing to fetch from system RAM or disk. I don't know if its drivers, architecture, or some of both but AMD's products have not historically excelled in this workload. The workload is Lockheed Martin Prepar3d, for anyone interested. Now that flight simulators have moved to 64-bit, the potential to display the highest quality textures "as far as the eye can see" basically fulfills my lifelong desire for visualization in this class of software.

    This generation may be too soon though, at least for the price I'm willing to pay. I could see maybe spending up to $2000 for a couple graphics cards if I knew they would get the job done and allow me to stop upgrading every cycle, but $6000 for a Quadro RTX 6000 is more than I'm willing to pay.
     
  13. silent_guy

    silent_guy Veteran Subscriber

    Feel free to provide say why you’d need 50% for that. I think it’s not even close.
     
  14. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■) Moderator Legend Alpha

    Please keep the personal bickering in check.
     
  15. CSI PC

    CSI PC Veteran

    Did any sites mention yet about the improved interopability between CUDA and gaming APIs specifically DX12 and Vulkan with Turing?
    It is part of CUDA 10 platform (seems 'SM_80' onwards function compatibility), and something that should be promising.

    Edit:
    Also of note for Turing is further optimized performance with mixed-precision GEMM in CUDA 10.
     
    Last edited: Aug 19, 2018
    nnunn and pharma like this.
  16. Geeforcer

    Geeforcer Harmlessly Evil Veteran

    Does anyone know if any reviews have revived the cards or is Monday just going to be archtexture overview?
     
  17. tEd

    tEd Casual Member Veteran

    Probably the second. Maybe some benches from nvidia. I think reviewers receive the cards at the event.
     
  18. pharma

    pharma Veteran


    Rumor ...
    NVIDIA RTX 2070 Specs Leaked – 2304 Cores, 8GB GDDR6 at ~$400
    https://wccftech.com/nvidia-rtx-2070-specs-leaked-2304-cores-8gb-gddr6-at-400/
     
  19. Markus

    Markus Newcomer

    Will real time raytracing have interesting non-graphics uses in games? 3D positional audio has not recovered since the Aureal A3D days. I know about AMD trueadio next an I wish more games used it. Would something like that benefit from being able to cast grotesquely many more rays or is the bottleneck still time-varying convolution kernels etc?
     
  20. OCASM

    OCASM Regular

    I don't know about that. OTOY has integrated Octane into Unity already and they've just officially announced an integration into UE4.

    Yes:

    https://www.techpowerup.com/246820/nvidia-does-a-trueaudio-rt-cores-also-compute-sound-ray-tracing
     
    mrcorbo likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

Loading...