Nvidia Turing Architecture [2018]

Discussion in 'Architecture and Products' started by pharma, Sep 13, 2018.

Tags:
  1. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Thought we have solved that on the other forum ;) Since several generations there is no dedicated "shaders" in hw anymore but all vendors use unified approaches. The hw (SM/CU etc.) executes generic shader code from any shader stage.
     
  2. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Indeed. Imo the feature was very well received among developers, technical material/conference presentations exists and we actively engage with developers. From that point I am not worried.
    Marketing it to consumers is a question of having applications/titles that use the tech. This will take a bit (compared to dxr there is no standard yet) until in actual products. I am used to different timelines, where it takes many years from research until new technologies get adopted for the masses, and some aren't. That is normal and just how it works.
     
    #242 pixeljetstream, Dec 27, 2018
    Last edited: Dec 27, 2018
  3. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,727
    Likes Received:
    4,395
    TU102 has 50% more ROPs than TU104.

    The "Mesh Shader" like other shaders is a "program", a piece of code that runs on the shader processors. The "dedicated hardware" for the mesh shader to be possible is mostly giving those shader processors access to the caches in the front end, as well as support for new, dedicated instructions.
    This is just a layman explanation to layman knowledge, though. Others here are probably better suited to explain in more detail.
     
    pixeljetstream and pharma like this.
  4. pixeljetstream

    Newcomer

    Joined:
    Dec 7, 2013
    Messages:
    30
    Likes Received:
    60
    Yes. @ToTTenTranz that's pretty good explanation. Just a clarification about "cache in fronted": there are new instructions to read/write more of the output/input data. For NV architecture, typically stored in L1/SMEM shown in the presentations linked here:

    There is no need for "units" as such, just some hw state to disable the traditional index/vertex fetch given developers fetch all themselves and some new plumbing of data flow in the hw for spawning tasks/mesh warps etc.
     
    #244 pixeljetstream, Dec 27, 2018
    Last edited: Dec 27, 2018
    pharma likes this.
  5. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,832
    Likes Received:
    1,541
    Ray Tracing Gems - Available Mid March
    February 6, 2019
    To help developers navigate this new technology, a wide-ranging book on the topic is being published early this year: Ray Tracing Gems.

    We have some great news: Readers of NVIDIA’s Dev News Center get early access to the text, at no cost! NVIDIA will be distributing the book for free in its entirety, as a series of seven PDFs. Every few days, a new section of the book will be made available.

    Ray Tracing Gems
    is the work of more than 60 contributors, all experts in the field of ray tracing. Their articles cover techniques that are not often discussed in general texts, but are important for high quality results.

    You can find the first installment and future segments on the NVIDIA Developer Zone.
    [​IMG]

    Ray Tracing Gems is not meant as a survey of the field of ray tracing. There are already fine books that provide a general education, many of them free; see this resource list, for example. Rather, this volume is more in the spirit of other gems books, such as GPU Gems, containing articles covering techniques that are often not discussed in general texts but that are important for high-quality results. The book also includes tutorials on newer technologies, along with guides that pull together best practices for solving specific problems. The second half of the book includes studies of larger systems focused on a variety of effects.
    https://news.developer.nvidia.com/the-authoritative-book-on-real-time-ray-tracing-has-arrived/
     
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,108
    Likes Received:
    1,802
    Location:
    Finland
    I'm having some hard time wrapping my head around the TU116 FP16-units.
    Are the really just that, FP16 units, or could they be (software limited?) tensor cores instead? It sounds a bit strange NVIDIA would develop new FP16 CUDA-cores when they've had 2xFP16/1xFP32-units before (Tegra X1, GP100 at least)
     
    Dr Evil likes this.
  7. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,479
    Likes Received:
    384
    Location:
    Varna, Bulgaria
    I was examining the die shots of TU106 and TU116, and it seems to me that TU116's SM structure is no different from the former, so I think TU116 simply disables RTX and either does the same with the tensor cores or just throttles them to ordinary FP16 op's.
     
    Dr Evil likes this.
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,108
    Likes Received:
    1,802
    Location:
    Finland
    Wait, there's actual dieshots of the Turings somewhere? Artist impressions don't count.
     
  9. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,479
    Likes Received:
    384
    Location:
    Varna, Bulgaria
    Yep, in the usual place: https://www.flickr.com/photos/130561288@N04

    Not classical micro-graphs though, but non-destructive IR scans. The downside is the laser-print obstruction, if present.
     
    Rufus, Newguy, pharma and 3 others like this.
  10. Ryan Smith

    Regular Subscriber

    Joined:
    Mar 26, 2010
    Messages:
    608
    Likes Received:
    1,034
    Location:
    PCIe x16_1
    For what it's worth, NVIDIA says they're new units, and not tensor cores. But truthfully, I fully expect they're leaving out some details in order to obfuscate parts of the architecture and maintain their technological advantage.
     
    Silent_Buddha and CaptainGinger like this.
  11. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    In general, you'd expect separate FP16 ALUs to be higher area than merged FP16/FP32 ALUs *but* lower power. In theory you'd expect to pay a very small power efficiency cost not just on FP16 calculations but also FP32... so it might make sense for NVIDIA to have separate FP16 ALUs; in fact, I don't see any clear evidence that they weren't already separate physically on Tegra X1 and GP100, even if they couldn't be used simultaneously due to register file/scheduling limitations?

    If their main reason was power, at first glance it doesn't feel like it'd make a lot of sense for TU116 since it's a lower-end chip which is more price sensitive... however, it's probably also aggressively targeting the laptop market, so it might make sense because of that. The alternative would have been to remove FP16 support completely as not many current games benefit from it, but those kinds of design decisions are done long in advance, and at that point it might have seemed risky to give AMD a potential advantage in FP16 throughput.

    Anyhow in the grand scheme of things it's a minor implementation detail; I very much doubt those FP16 units are taking very much area... It is interesting that TU106 vs TU116 aren't that different despite the lack of RTX and tensor cores though... I don't have the time to do a proper analysis of those die shots but I'd be very curious if anyone does!
     
  12. SlmDnk

    Regular

    Joined:
    Feb 9, 2002
    Messages:
    523
    Likes Received:
    66
  13. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,832
    Likes Received:
    1,541
    Integrating Ray Tracing Into an Existing Engine
    March 12, 2019


    https://news.developer.nvidia.com/i...xisting-engine-three-things-you-need-to-know/

    GDC 2019:
    Title: A DEVTECH’S ESSENTIAL GUIDE TO RAY TRACING
    Location: Room 205, South Hall
    Date: Thursday, March 21
    Time: 11:30am – 12:30pm
    Pass Type: All Access, GDC Conference + Summits, GDC Conference, GDC Summits, Expo Plus, Audio Conference + Tutorial, Indie Games Summit
    Topic: Programming
    Format: Sponsored Session

    https://schedule.gdconf.com/session...ide-to-ray-tracing-presented-by-nvidia/865242
     
    jlippo, Heinrich4 and OCASM like this.
  14. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,702
    Likes Received:
    2,430
    Heinrich4, pharma and Lightman like this.
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,702
    Likes Received:
    2,430
  16. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,832
    Likes Received:
    1,541
    Tips and Tricks: Ray Tracing Best Practices
    March 20, 2019
    FAQ
    Q. What’s the relationship between number of primitives and cost (time) of acceleration structure build/updates?
    A. It’s mostly a linear relationship. Well, it starts getting linear beyond a certain primitive count, before that it’s bound by constant overhead. The exact numbers here are in flux and wouldn’t be reliable.

    Q. Assuming maximum occupancy, what’s the GPU throughput SOL for acceleration structure build/updates?

    A. An order-of-magnitude guideline is O(100 million) primitives/sec for full builds and O(1 billion) primitive/sec for update.

    Q. What’s the relationship between number of unique shaders and compilation cost (time) for RT PSOs?

    A. It is roughly linear.

    Q. What’s the typical cost of RT PSO compilation in games today?

    A. Anywhere from, 20ms → 300ms, per pipeline.

    Q. Is there guidance for how much alpha/transparency should be used? What’s the cost of anyhit vs closest hit?

    A. Any-hit is expensive and should be used minimally. Preferably mark geometry (or instances) as OPAQUE, which will allow ray traversal to happen in fixed-function hardware. When AH is needed (e.g. to evaluate transparency etc), keep it as simple as possible. Don’t evaluate huge shading networks just to execute what amounts to an alpha tex lookup and an if-statement.

    Q. How should the developer manage shading divergence?

    A. Start by shading in closest-hit shaders, in a straightforward implementation. Then analyze perf and decide how much of a problem divergence is and how it can be addressed. The solution may or may not include “manual scheduling”.

    Q. How can the developer query the stack memory allocation?

    A. The API has functionality to query per-thread stack requirements on pipelines/shaders. This is useful for tracking and analysis purposes, and an app should always strive to use as little shader stack as possible (one recommendation is to dump stack size histograms and flag outliers during development). Stack requirements are most directly influenced by live state across trace calls, which should be minimized (see Best Practices)..

    Q. How much extra VRAM does a typical ray-tracing implementation consume?

    A. Today, games implementing ray-tracing are typically using around 1 to 2 GB extra memory. The main contributing factors are acceleration structure resources, ray tracing specific screen-sized buffers (extended g-buffer data), and driver-internal allocations (mainly the shader stack).
    https://devblogs.nvidia.com/rtx-best-practices/
     
    #256 pharma, Mar 20, 2019
    Last edited: Mar 20, 2019
    Alexko, Lightman and OCASM like this.
  17. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,702
    Likes Received:
    2,430
    Some rough estimation for the area taken by the RT cores and Tensor cores (about 8~10% of the die)

     
    tinokun, pharma and Lightman like this.
  18. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,155
    Likes Received:
    8,306
    Location:
    Cleveland
    If that's all it takes up, why make anything without them?
     
  19. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,702
    Likes Received:
    2,430
    Ray Tracing requires a minimum raster performance as well, it will not make sense to offer it below a certain performance tier.
     
    BRiT likes this.
  20. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,702
    Likes Received:
    2,430
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...