Nvidia Turing Speculation thread [2018]

Discussion in 'Architecture and Products' started by Voxilla, Apr 22, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    My hope for NVlink is that NV will offer an operational mode that pools VRAM without attempting to split the workload. Essentially, one GPU would perform all the calculations of a given workload with the other sitting idle or perhaps only assisting trivially. As I said, I don't need more performance necessarily (though I won't say no to it of course), I need about the level of performance which is already available via say GP102 but with more VRAM.
     
  2. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    The reason you need BW between GPUs to work on the same frame is to give GPU B access to data that was generated by GPU A and vice versa.

    You don’t need it for other data, such as non-rendered textures, Z buffer etc.

    It’s clear that these shared resources are only a fraction of the total memory BW. I don’t know how much, but it seems very unlikely that they are 50%. So that number seems very wrong to me.

    Are they less than 8% (50GBps/~600GBps) ? Probably not. But even if they are more than that, you could still have meaningful performance increases.

    It’s also very game dependent, of course.

    That said: if NVLink and tensor cores work in the GeForce GPUs the way they work on Tesla class GPUs, then deep learning people will try gobble all of them up anyway, so it’s all academic. :)
     
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    For that kind of use case, the NVLink BW is almost certainly way too low. So don’t count on it.
     
  4. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Not necessarily. My particular use case requires a large texture cache for textures that would not be loaded/unloaded frequently.
     
  5. Communism

    Newcomer

    Joined:
    Feb 1, 2014
    Messages:
    15
    Likes Received:
    7
    That would be true if you are doing old school rendering.

    If you are doing deferred rendering, you have to synchronize all the intermediate steps, which means an absolute crapton of bandwidth needed.

    I'm sure there's not that many deep learning students to make an appreciable dent in RTX 2080 Ti supplies anyways, and serious deep learning people will need/want the extra VRAM that the Quadros have.

    But yeah, there's every reason to think Nvidia would lock down/artificially slow down anything to do with deep learning / hpc outside of pointless epeen benchmarks and directly gaming related stuff in their Geforce drivers.
     
  6. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Remember though NVLink2 is a more cohesive cache design and makes them actually closer to being 1 GPU in design.
    So it is a bit limited with 100GB/s (both links can be used to one GPU) but gains by having slightly better latency than SLI and importantly better communication/integration between the GPUs for developers/engines, while also better BW than SLI or even PCIe x16gen4 when using both Nvidia bricks in a dual GPU setup; NVLink is designed to be flexible so it would be fine but yeah still has limitations.

    Edit:
    I would need to check, but I thought real world application comparison between NVLink and SLI was about 25% gains, but big caveat this is not gaming and not sure that was the latest NVLink iteration.
     
    #346 CSI PC, Aug 18, 2018
    Last edited: Aug 18, 2018
  7. Communism

    Newcomer

    Joined:
    Feb 1, 2014
    Messages:
    15
    Likes Received:
    7
    I'm not sure you understand the gigantic gulf in bandwidth requirement between AFR 2 with 2+ frame latency and working on a single frame with no artificially added latency.

    One of these literally requires not much more than enough bandwidth to send over the finished frame in time, while the other requires both GPUs to synchronize every single intermediate render state.
     
  8. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France

    Of course I don't know your precise need, but, in that particular case, why not a Vega FE with 16gb of ram or even a Vega Pro SSG ?
     
  9. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Comes down to how well engines/API (more so DX12/Vulkan) can integrate with NVLink/unified-cohesive cache in Multi-GPU setup and could be different to SLI or what we see with AFR.
    TBH I need to see how UE4 has integrated NVLink.

    Edit:
    While not exactly the same worth noting several of the UE4 demos have used NVLink when driven by multiple GPUs for their more impressive real-time demonstrations.
    The real-time Star Wars Ray Tracing demo was splitting work between 4 V100 GPUs (each doing different tasks) using NVLink, but that is just one example.
     
    #349 CSI PC, Aug 18, 2018
    Last edited: Aug 18, 2018
  10. Communism

    Newcomer

    Joined:
    Feb 1, 2014
    Messages:
    15
    Likes Received:
    7
    4 frames of latency obviously doesn't matter for when you are watching a video render (aka, what that showcase is).

    In real games every single frame of latency matters.

    Let's just agree that you don't entirely understand the situation in this particular scenario that i was talking to silent_guy about so we don't have to talk in circles ok?
     
  11. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    I know which is why I said it was one example, but the point it is a different presentation when compared to SLI/AFR.
    Does it have to be AFR/work like SLI?
    Answer is no, but it comes down to how well the API/engines can utilise NVLink and splitting tasks without it being traditional AFR.

    Edit:
    Just to clarify I am not talking about supporting older games but a few current and future games.
     
    #351 CSI PC, Aug 18, 2018
    Last edited: Aug 18, 2018
  12. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Radeon SSG was a very intriguing product when I first heard of it. Unfortunately, the price is too high and the performance isn't quite where I need it to be.

    The particular use case I have in mind involves lots of triangles and many textures (up to 4k in size) spreading out over a view distance of perhaps 100 miles or more. Much of the detail is lost to LOD, but there are still quality and performance gains to be had by keeping all the textures in VRAM and not needing to fetch from system RAM or disk. I don't know if its drivers, architecture, or some of both but AMD's products have not historically excelled in this workload. The workload is Lockheed Martin Prepar3d, for anyone interested. Now that flight simulators have moved to 64-bit, the potential to display the highest quality textures "as far as the eye can see" basically fulfills my lifelong desire for visualization in this class of software.

    This generation may be too soon though, at least for the price I'm willing to pay. I could see maybe spending up to $2000 for a couple graphics cards if I knew they would get the job done and allow me to stop upgrading every cycle, but $6000 for a Quadro RTX 6000 is more than I'm willing to pay.
     
  13. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Feel free to provide say why you’d need 50% for that. I think it’s not even close.
     
  14. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,502
    Likes Received:
    24,399
    Please keep the personal bickering in check.
     
  15. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Did any sites mention yet about the improved interopability between CUDA and gaming APIs specifically DX12 and Vulkan with Turing?
    It is part of CUDA 10 platform (seems 'SM_80' onwards function compatibility), and something that should be promising.

    Edit:
    Also of note for Turing is further optimized performance with mixed-precision GEMM in CUDA 10.
     
    #355 CSI PC, Aug 19, 2018
    Last edited: Aug 19, 2018
    nnunn and pharma like this.
  16. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    Does anyone know if any reviews have revived the cards or is Monday just going to be archtexture overview?
     
  17. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,105
    Likes Received:
    70
    Location:
    switzerland
    Probably the second. Maybe some benches from nvidia. I think reviewers receive the cards at the event.
     
  18. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534

    Rumor ...
    NVIDIA RTX 2070 Specs Leaked – 2304 Cores, 8GB GDDR6 at ~$400
    https://wccftech.com/nvidia-rtx-2070-specs-leaked-2304-cores-8gb-gddr6-at-400/
     
  19. Markus

    Newcomer

    Joined:
    Jul 26, 2016
    Messages:
    12
    Likes Received:
    6
    Will real time raytracing have interesting non-graphics uses in games? 3D positional audio has not recovered since the Aureal A3D days. I know about AMD trueadio next an I wish more games used it. Would something like that benefit from being able to cast grotesquely many more rays or is the bottleneck still time-varying convolution kernels etc?
     
  20. OCASM

    Regular

    Joined:
    Nov 12, 2016
    Messages:
    921
    Likes Received:
    874
    I don't know about that. OTOY has integrated Octane into Unity already and they've just officially announced an integration into UE4.

    Yes:

    https://www.techpowerup.com/246820/nvidia-does-a-trueaudio-rt-cores-also-compute-sound-ray-tracing
     
    mrcorbo likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...