Nvidia Ampere Discussion [2020-05-14]

Discussion in 'Architecture and Products' started by Man from Atlantis, May 14, 2020.

Tags:
  1. Voxilla

    Voxilla Regular

    When the memory system is based on HBM2, the PCB can be quite small.
    If this cooler is real, it's almost a given there will be HBM2.
    3 stacks would be enough for 24 GB, and lower variant with 2 stack for 16 GB.
    Just my 2 cents.
    6481781128561e6f54f4d36b6c5c6a821bdd3e20245eef299a11603627365b55.jpg
     
  2. Scott_Arm

    Scott_Arm Legend

    If they're being less conservative about clocks and that's where the power consumption is coming from, it'll be interesting. Means we'll have better performance out of the box, but less overclocking potential. Also means they may be more sensitive to ambient temperatures. Maybe under-volting to keep boost clocks will be a thing like with AMD.
     
  3. NVIDIA’s Flagship RTX 3090 GPU Will Have A TGP Of 350W, Complete Breakdown Leaked

    https://wccftech.com/nvidia-rtx-3090-gpu-tgp-350w

    Estimated Power Consumption / Losses

    Total Graphics Power TGP 350 Watts
    24 GB GDDR6X Memory (GA_0180_P075_120X140, 2.5 Watts per Module) -60 Watts
    MOSFET, Inductor, Caps NVDD (GPU Voltage) -26 Watts
    MOSFET, Inductor, Caps FBVDDQ (Framebuffer Voltage) -6 Watts
    MOSFET, Inductor, Caps PEXVDD (PCIExpress Voltage) -2 Watts
    Other Voltages, Input Section (AUX) -4 Watts
    Fans, Other Power -7 Watts
    PCB Losses -15 Watts
    GPU Power approx. 230 Watts
     
    Lightman and pharma like this.
  4. pharma

    pharma Veteran

    It's interesting all these rumors point to GDDR6X memory (GA_0180_P075_120X140, 2.5 Watts per Module).
     
  5. Digidi

    Digidi Regular

  6. Scott_Arm

    Scott_Arm Legend

    10nm? I guess that would explain the power consumption. Why isn't Nvidia transitioning to 7nm? I would have thought it would be very mature right now. Limited capacity of 7nm fabs?
     
  7. Digidi

    Digidi Regular

    10nm is the guessing from former varius information, better say its samsung 8nm process which some people think its a better 10nm process. This was not said by igor. Sorry my failure.
     
  8. Bondrewd

    Bondrewd Veteran

    Yeah, even with HiSilicon leaving.
     
  9. pharma

    pharma Veteran

    Samsung is purported to get a "small number of orders", with rumors pointing to professional dies. It would likely be using Samsung's 7nm EUV process

    June 11, 2020
    https://www.pcgamer.com/nvidia-ampere-release-date-specs-performance/
     
    Lightman and PSman1700 like this.
  10. Voxilla

    Voxilla Regular

    Regarding the Igor speculation how does he get to the strange 'space saving' placement of 12 GDDR6 chips, 4 left, 4 right, 3 top, 1 bottom?
    Looking at the PCB of the RTX 2080 Ti, placement is 4 top, 4 right, 4 bottom, which definitely is more space saving, and this is on a full sized PCB.
    Notice that all the power regulators need quite a bit of space too.
    PCB (1).jpg
     
    Last edited: Jun 14, 2020
  11. JoeJ

    JoeJ Veteran

    pharma likes this.
  12. Kaotik

    Kaotik Drunk Member Legend

    Erm... How exactly is the backside fan supposed to cool a chip that would be on the same level as fans topside, buried under backplate in the opposite end of the card?
     
    JoeJ, Krteq and DegustatoR like this.
  13. DegustatoR

    DegustatoR Veteran

    These "separate RT chip" rumors are getting seriously old at this point.
     
    Lightman, szatkus, no-X and 5 others like this.
  14. Ext3h

    Ext3h Regular

    Utter BS. The entire programming model is tied towards a a single thread controlling a single ray at a time. With threads in flight, allocated registers etc, it makes no sense whatsoever to even offload that from the corresponding SMM, it's tightly bound with scheduling on that SMM and is - in case you are not all too unlucky with cache misses - most certainly latency sensitive as latency between shader invocations directly translates to occupancy constraints. Even though it does make sense to encapsulate it as a dedicated unit within a SMM, just like any other "core" in there. Which is what the patent describes. Fixed function traversal, committing rays to the RT cores, and processing a warp full of hits as they are found.

    Off-chip coprocessor, and then not even on the same package, is pure fiction. If someone suggested that NVidia would break entire SMMs off into chiplets, I might even believe that person. But ripping function units, for anything interleaved with execution flow, out of an SMM? Not a chance.
     
  15. T2098

    T2098 Newcomer

    It's not necessarily *complete* BS. If you read the entire patent, Nvidia clearly expended an awful lot of effort to make the coprocessor asynchronous and decoupled from the main rendering path.
    I do agree that off-package is quite unlikely, and wccftech's talk of a separate ray tracing add-in card isn't going to happen.

    However, if you're already making a GPU with HBM, and you're already paying money for an interposer, breaking out the 'coprocessor' to its own chiplet doesn't seem like much of a stretch given the patent.
    Also, since HBM2 also has its own fancy control die at the bottom of the stack, it wouldn't be too much of a stretch for Nvidia to try to get one of its partners to make a variant of HBM that's dual-ported.

    As the patent mentions, the coprocessor may function with read-only access to the shared memory too. Sounds exactly like the dual-ported VRAM that was common back in the 80s.
     

    Attached Files:

    pharma likes this.
  16. Pinstripe

    Pinstripe Newcomer

    Nvidia engineers last year spoke about chiplet design (can't find the interview) and were very specific about future consumer products to remain monolithic dies due to cost and complexity reasons.
     
    pharma likes this.
  17. T2098

    T2098 Newcomer

    I'd agree that it's probably highly unlikely in this case too, although not impossible.

    After reading through the patent a few times, one of the first things I thought of was the strange rumours about GDDR6x made a little more sense if the ray tracing hardware was being at least partially decoupled from the rest of the GPU.
    Interestingly enough, one of GDDR6's generally unused features are its dual channel per package design.

    Not too much of a stretch to imagine tweaking it to either add a 3rd 16-bit channel (read-only or otherwise) while keeping identical signaling specs to give the ray tracing 'coprocessor' its own half-width memory bus.
    Or alternatively, multiplexing the coprocessor's memory access onto one of the two 16-bit channels.

    IE - Most of the time the basic GPU keeps its full memory bandwidth, but for the times when the 'coprocessor' needs to access memory, it could lock the 2nd channel on the GDDR6 for its own use and then release it when finished. At no point does the GPU completely stall or get locked out of memory, but from time to time will drop to half memory bandwidth. If you aren't ray tracing, then the base GPU gets the full memory bandwidth 100% of the time and the coprocessor is idle.

    The patent does make a lot more sense in the context of a hypothetical datacentre product for Google Stadia and the like, though.

    Take GA100, add an RTX 'coprocessor' to the interposer, sprinkle a little dual-ported HBM2 into the mix, and you have your halo graphics product without needing to make a separate die or compromise GA100's compute performance by removing functional units to make room for the RTX hardware. Given that GA100 is already right near the reticle limit, it's either add the ray tracing capability as a coprocessor on the package, or design a completely separate ~800+mm2 die for a maximum-effort RTX-capable graphics product.

    If this is the approach Nvidia's taking, it's pretty easy to see how the rumour mill may be right about the coprocessor/chiplet, just wrong about which product segment.
     

    Attached Files:

    pharma likes this.
  18. trinibwoy

    trinibwoy Meh Legend

    I don’t see anything in the patent that wouldn’t also apply to the existing on-chip co-processors (tessellators, TMUs). Certainly don’t see any evidence in the patent that nvidia is considering a separate chip for BVH intersection.

    As usual the correct answer is probably the simplest one. The rumors are fantasy and RT hardware will remain on-chip.

    Once we do get chiplets the work to render a frame will be allocated at pixel tile granularity across homogenous chiplets. Bet on it.
     
    Last edited: Jun 17, 2020
  19. pharma

    pharma Veteran

    What would be required to access the dual channel per package feature? Revised or different memory controller?
     
  20. CarstenS

    CarstenS Legend Subscriber

    I completely agree. It would have made sense maybe in the early 90s, where everything was co-processorized that could not escape up a tree fast enough.

    Nothing I've read in the patent so far suggests that by co-processor, they mean anything else than a semi-independent part of the IP deeply integrated within the same physical chip as the … wait for it … streaming multiprocessors. For that reason, I could imagine that Gaming-Ampere retains the large(r) SM/L1-pool that GA100 has, maybe even hardcoded (BIOS-locked) in a triple-split to reserve the additional memory over Turing for raytracing.
     
    Ext3h likes this.
Loading...

Share This Page

Loading...