Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

Status
Not open for further replies.
That quote mentions BVH creation not BVH intersection. Nvidia does BVH creation on the CPU and intersection on dedicated GPU hardware.

It might be a while before we know what the Xbox is actually doing.
Assuming its using the method described in AMD's RT patent, intersection is done in hardware (which is new unit in the TMU complex) while shaders determine where to go next
 
Assuming its using the method described in AMD's RT patent, intersection is done in hardware (which is new unit in the TMU complex) while shaders determine where to go next

I think that's a pretty good assumption considering the RT blocknames in the GitHub leak for Arden are exactly the same as the block names in the patent.
 
Next-Generation Nvidia GPUs to Use TSMC’s CoWoS Packaging
March 11, 2020
Nvidia’s next-generation GPUs will most likely tap into TSMC’s CoWoS packaging in 2020. A report from DigiTimes claims that Nvidia will be one of three major clients to take advantage of this TSMC and CoWos technology.

Joining Nvidia are Xilinx and HiSilicon in being the others to implement CoWoS (Chip-on-Wafer-on-Substrate) packaging. CoWoS uses 2.5D packaging technology which features multiple chiplets on a single interposer. This leaves multiple advantages like power consumption, a smaller footprint, and increased bandwidth.
Nvidia’s main rival, Advanced Micro Devices (AMD) name is not present, despite the Vega 20 7nm silicon which packs CoWoS. With this, NVidia is at a competitive advantage as it will produce 6,000 to 8,000 wafers per months.

Further confirming, Nvidia, Xilinx and HiSilicon will be taking up most of the CoWoS production capacity. It’s unlikely that CoWoS technology is going to feature in Nvidia’s consumer-grade graphics cards due to its high cost. We may see it on other products emphasizing on data systems and high-performance computing (HPC)
https://segmentnext.com/2020/03/11/next-generation-nvidia-gpus-to-use-tsmcs-cowos-packaging/
 
Last edited by a moderator:
March 4, 2020
The new CoWoS platform will initially be used for a new processor from Broadcom for the HPC market, and will be made using TSMC's EUV-based 5 nm (N5) process technology. This system-in-package product features ‘multiple’ SoC dies as well as six HBM2 stacks with a total capacity of 96 GB. According to Broadcom's press release, the chip will have a total bandwidth of up to 2.7 TB/s, which is in line with what Samsung’s latest HBM2E chips can offer.

By doubling the size of SiPs using its mask stitching technology, TSMC and its partners can throw in a significantly higher number of transistors at compute-intensive workloads. This is particularly important for HPC and AI applications that are developing very fast these days. It is noteworthy that TSMC will continue refining its CoWoS technology, so expect SIPs larger than 1,700 mm2 going forward.
https://www.anandtech.com/show/1558...-mm2-cowos-interposer-2x-larger-than-reticles
 
Is overlapping masks supposed to be impressive? Aren't the bumps and pitches on the 10s of micrometer scale? (Assuming they are using bumps and not just balling, in which case the pitch is even larger.) Not exactly nm range process or alignment required, making the interposer on EUV would be a bit of a waste.
 
Last edited:
The SoIC stuff sounds really impressive. Does it have big advantages vs CoWoS for high power HPC chips? For example, does the removal of microbumps improve thermal performance enough that stacking DRAM directly on top of a GPU chiplet becomes possible?
 
The SoIC stuff sounds really impressive. Does it have big advantages vs CoWoS for high power HPC chips? For example, does the removal of microbumps improve thermal performance enough that stacking DRAM directly on top of a GPU chiplet becomes possible?
They are used for different things. SoIC allows for direct die to die bonding (besides others) whereas CoWoS is mainly die on interposer which it then bonded to a substrate. SoIC is not yet used in mass production with any chip afaik and is TSMC's solution akin to Foveros. So a true 3D packaging tech. CoWoS is used since three or four years now to integrate HBM on the chip. Difference now is that the reticle size is much bigger which means in this case with nvidia they can integrate really big chips with reticle size upto 1700mm2 with HBM.
CoWoS can be used together with SoIC or with InFO (which is already in used by mobile chips)
 
https://www.notebookcheck.net/NVIDI...-GA102-40-up-on-the-RTX-2080-Ti.456402.0.html

https://www.notebookcheck.net/NVIDI...cores-and-12-GB-of-18-Gbps-VRAM.458939.0.html

https://www.tweaktown.com/news/7118...rtx-3080-ti-is-40-faster-than-2080/index.html

NVIDIA Ampere GPUs
GA102 - 84 SMs / 5376 CUDA cores / 12GB GDDR6 / 384-bit bus - 40% faster than RTX 2080 Ti
GA103 - 60 SMs / 3840 CUDA cores / 10GB GDDR6 / 320-bit bus - 10% faster than RTX 2080 Ti
GA104 - 48 SMs / 3072 CUDA cores / 8GB GDDR6 / 256-bit bus - 5% slower than RTX 2080 Ti

NVIDIA Ampere GA100 Specs
8192 CUDA cores @ 2GHz (2.2GHz boost)
1024 Tensor Cores
130 RT Cores
48GB of HBM2e memory @ 1.2GHz
300W TDP
TSMC 7nm+
36 TFLOPs peak output

71185_01_nvidia-ampere-geforce-rtx-3080-ti-40-faster-2080.jpg



edit: I really don't know whether to believe this leak. The RTX 3060 only with 6GB of memory and 192 bit bus? It would be a step back from the 2060 super.

edit2: thanks for the move, I don't know how I missed this post.
 
Last edited by a moderator:
https://www.notebookcheck.net/NVIDI...-GA102-40-up-on-the-RTX-2080-Ti.456402.0.html

https://www.notebookcheck.net/NVIDI...cores-and-12-GB-of-18-Gbps-VRAM.458939.0.html

https://www.tweaktown.com/news/7118...rtx-3080-ti-is-40-faster-than-2080/index.html

NVIDIA Ampere GPUs
GA102 - 84 SMs / 5376 CUDA cores / 12GB GDDR6 / 384-bit bus - 40% faster than RTX 2080 Ti
GA103 - 60 SMs / 3840 CUDA cores / 10GB GDDR6 / 320-bit bus - 10% faster than RTX 2080 Ti
GA104 - 48 SMs / 3072 CUDA cores / 8GB GDDR6 / 256-bit bus - 5% slower than RTX 2080 Ti

NVIDIA Ampere GA100 Specs
8192 CUDA cores @ 2GHz (2.2GHz boost)
1024 Tensor Cores
130 RT Cores
48GB of HBM2e memory @ 1.2GHz
300W TDP
TSMC 7nm+
36 TFLOPs peak output

71185_01_nvidia-ampere-geforce-rtx-3080-ti-40-faster-2080.jpg



edit: I really don't know whether to believe this leak. The RTX 3060 only with 6GB of memory and 192 bit bus? It would be a step back from the 2060 super.

edit2: thanks for the move, I don't know how I missed this post.

Looks fake, the ram just isn't enough versus the consoles. 10gb for mid range wouldn't match high end console VRAM requirements, there's no way they could charge $500 for it. I'd also expect a 320bit bus to be some cut down bin from a full fat 384bit GPU, not its own separate tapeout.

You could just double the ram, awkwardly as GDDR6 is mostly sold in 8gb sticks. But the worse part is the bandwidth doesn't add up with compute performance. 103 to 102 has 20% larger bus with 40% more compute area and a 26% performance uplift... wait what? Not to mention 104 to 103 adds up even less.
 
Last edited:
Looks fake, the ram just isn't enough versus the consoles. 10gb for mid range wouldn't match high end console VRAM requirements, there's no way they could charge $500 for it. I'd also expect a 320bit bus to be some cut down bin from a full fat 384bit GPU, not its own separate tapeout.

You could just double the ram, awkwardly as GDDR6 is mostly sold in 8gb sticks. But the worse part is the bandwidth doesn't add up with compute performance. 103 to 102 has 20% larger bus with 40% more compute area and a 26% performance uplift... wait what? Not to mention 104 to 103 adds up even less.
Definitely fake.
https://www.tomshardware.com/news/nvidia-rtx-3080-ampere-all-we-know (and even that may be wrong)
 
Status
Not open for further replies.
Back
Top