Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    When the memory system is based on HBM2, the PCB can be quite small.
    If this cooler is real, it's almost a given there will be HBM2.
    3 stacks would be enough for 24 GB, and lower variant with 2 stack for 16 GB.
    Just my 2 cents.
    6481781128561e6f54f4d36b6c5c6a821bdd3e20245eef299a11603627365b55.jpg
     
  2. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    If they're being less conservative about clocks and that's where the power consumption is coming from, it'll be interesting. Means we'll have better performance out of the box, but less overclocking potential. Also means they may be more sensitive to ambient temperatures. Maybe under-volting to keep boost clocks will be a thing like with AMD.
     
  3. A1xLLcqAgt0qc2RyMz0y

    Veteran

    Joined:
    Feb 6, 2010
    Messages:
    1,589
    Likes Received:
    1,490
    NVIDIA’s Flagship RTX 3090 GPU Will Have A TGP Of 350W, Complete Breakdown Leaked

    https://wccftech.com/nvidia-rtx-3090-gpu-tgp-350w

    Estimated Power Consumption / Losses

    Total Graphics Power TGP 350 Watts
    24 GB GDDR6X Memory (GA_0180_P075_120X140, 2.5 Watts per Module) -60 Watts
    MOSFET, Inductor, Caps NVDD (GPU Voltage) -26 Watts
    MOSFET, Inductor, Caps FBVDDQ (Framebuffer Voltage) -6 Watts
    MOSFET, Inductor, Caps PEXVDD (PCIExpress Voltage) -2 Watts
    Other Voltages, Input Section (AUX) -4 Watts
    Fans, Other Power -7 Watts
    PCB Losses -15 Watts
    GPU Power approx. 230 Watts
     
    Lightman and pharma like this.
  4. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    It's interesting all these rumors point to GDDR6X memory (GA_0180_P075_120X140, 2.5 Watts per Module).
     
  5. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
  6. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    10nm? I guess that would explain the power consumption. Why isn't Nvidia transitioning to 7nm? I would have thought it would be very mature right now. Limited capacity of 7nm fabs?
     
  7. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    10nm is the guessing from former varius information, better say its samsung 8nm process which some people think its a better 10nm process. This was not said by igor. Sorry my failure.
     
  8. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Yeah, even with HiSilicon leaving.
     
  9. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    Samsung is purported to get a "small number of orders", with rumors pointing to professional dies. It would likely be using Samsung's 7nm EUV process

    June 11, 2020
    https://www.pcgamer.com/nvidia-ampere-release-date-specs-performance/
     
    Lightman and PSman1700 like this.
  10. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    Regarding the Igor speculation how does he get to the strange 'space saving' placement of 12 GDDR6 chips, 4 left, 4 right, 3 top, 1 bottom?
    Looking at the PCB of the RTX 2080 Ti, placement is 4 top, 4 right, 4 bottom, which definitely is more space saving, and this is on a full sized PCB.
    Notice that all the power regulators need quite a bit of space too.
    PCB (1).jpg
     
    #270 Voxilla, Jun 14, 2020
    Last edited: Jun 14, 2020
  11. JoeJ

    Veteran

    Joined:
    Apr 1, 2018
    Messages:
    1,523
    Likes Received:
    1,772
    pharma likes this.
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Erm... How exactly is the backside fan supposed to cool a chip that would be on the same level as fans topside, buried under backplate in the opposite end of the card?
     
    JoeJ, Krteq and DegustatoR like this.
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    These "separate RT chip" rumors are getting seriously old at this point.
     
    Lightman, szatkus, no-X and 5 others like this.
  14. Ext3h

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    428
    Likes Received:
    497
    Utter BS. The entire programming model is tied towards a a single thread controlling a single ray at a time. With threads in flight, allocated registers etc, it makes no sense whatsoever to even offload that from the corresponding SMM, it's tightly bound with scheduling on that SMM and is - in case you are not all too unlucky with cache misses - most certainly latency sensitive as latency between shader invocations directly translates to occupancy constraints. Even though it does make sense to encapsulate it as a dedicated unit within a SMM, just like any other "core" in there. Which is what the patent describes. Fixed function traversal, committing rays to the RT cores, and processing a warp full of hits as they are found.

    Off-chip coprocessor, and then not even on the same package, is pure fiction. If someone suggested that NVidia would break entire SMMs off into chiplets, I might even believe that person. But ripping function units, for anything interleaved with execution flow, out of an SMM? Not a chance.
     
  15. T2098

    Newcomer

    Joined:
    Jun 15, 2020
    Messages:
    55
    Likes Received:
    115
    It's not necessarily *complete* BS. If you read the entire patent, Nvidia clearly expended an awful lot of effort to make the coprocessor asynchronous and decoupled from the main rendering path.
    I do agree that off-package is quite unlikely, and wccftech's talk of a separate ray tracing add-in card isn't going to happen.

    However, if you're already making a GPU with HBM, and you're already paying money for an interposer, breaking out the 'coprocessor' to its own chiplet doesn't seem like much of a stretch given the patent.
    Also, since HBM2 also has its own fancy control die at the bottom of the stack, it wouldn't be too much of a stretch for Nvidia to try to get one of its partners to make a variant of HBM that's dual-ported.

    As the patent mentions, the coprocessor may function with read-only access to the shared memory too. Sounds exactly like the dual-ported VRAM that was common back in the 80s.
     

    Attached Files:

    pharma likes this.
  16. Pinstripe

    Newcomer

    Joined:
    Feb 24, 2013
    Messages:
    153
    Likes Received:
    133
    Nvidia engineers last year spoke about chiplet design (can't find the interview) and were very specific about future consumer products to remain monolithic dies due to cost and complexity reasons.
     
    pharma likes this.
  17. T2098

    Newcomer

    Joined:
    Jun 15, 2020
    Messages:
    55
    Likes Received:
    115
    I'd agree that it's probably highly unlikely in this case too, although not impossible.

    After reading through the patent a few times, one of the first things I thought of was the strange rumours about GDDR6x made a little more sense if the ray tracing hardware was being at least partially decoupled from the rest of the GPU.
    Interestingly enough, one of GDDR6's generally unused features are its dual channel per package design.

    Not too much of a stretch to imagine tweaking it to either add a 3rd 16-bit channel (read-only or otherwise) while keeping identical signaling specs to give the ray tracing 'coprocessor' its own half-width memory bus.
    Or alternatively, multiplexing the coprocessor's memory access onto one of the two 16-bit channels.

    IE - Most of the time the basic GPU keeps its full memory bandwidth, but for the times when the 'coprocessor' needs to access memory, it could lock the 2nd channel on the GDDR6 for its own use and then release it when finished. At no point does the GPU completely stall or get locked out of memory, but from time to time will drop to half memory bandwidth. If you aren't ray tracing, then the base GPU gets the full memory bandwidth 100% of the time and the coprocessor is idle.

    The patent does make a lot more sense in the context of a hypothetical datacentre product for Google Stadia and the like, though.

    Take GA100, add an RTX 'coprocessor' to the interposer, sprinkle a little dual-ported HBM2 into the mix, and you have your halo graphics product without needing to make a separate die or compromise GA100's compute performance by removing functional units to make room for the RTX hardware. Given that GA100 is already right near the reticle limit, it's either add the ray tracing capability as a coprocessor on the package, or design a completely separate ~800+mm2 die for a maximum-effort RTX-capable graphics product.

    If this is the approach Nvidia's taking, it's pretty easy to see how the rumour mill may be right about the coprocessor/chiplet, just wrong about which product segment.
     

    Attached Files:

    pharma likes this.
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    I don’t see anything in the patent that wouldn’t also apply to the existing on-chip co-processors (tessellators, TMUs). Certainly don’t see any evidence in the patent that nvidia is considering a separate chip for BVH intersection.

    As usual the correct answer is probably the simplest one. The rumors are fantasy and RT hardware will remain on-chip.

    Once we do get chiplets the work to render a frame will be allocated at pixel tile granularity across homogenous chiplets. Bet on it.
     
    #278 trinibwoy, Jun 16, 2020
    Last edited: Jun 17, 2020
  19. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
    What would be required to access the dual channel per package feature? Revised or different memory controller?
     
  20. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I completely agree. It would have made sense maybe in the early 90s, where everything was co-processorized that could not escape up a tree fast enough.

    Nothing I've read in the patent so far suggests that by co-processor, they mean anything else than a semi-independent part of the IP deeply integrated within the same physical chip as the … wait for it … streaming multiprocessors. For that reason, I could imagine that Gaming-Ampere retains the large(r) SM/L1-pool that GA100 has, maybe even hardcoded (BIOS-locked) in a triple-split to reserve the additional memory over Turing for raytracing.
     
    Ext3h likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...