NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
kopite hinted that ADA has a different SM architecture to ampere. if the perf/SM is increased that might explain these crazy power figures.

Ampere has an over abundance of flops for gaming. It's probably because the memory subsystem can't keep up. It wouldn't make sense to add even more flops to Ada. If SM performance increases significantly it should be "almost" all for RT.
 
In that case, we would see bigger differences, for example between 3080 and 3080 12 GB.

https://chipsandcheese.com/2021/05/13/gpu-memory-latencys-impact-and-updated-test/
For brevity we’ll be focusing on the top stall reason – “Long Scoreboard”. “Long Scoreboard” means the warp is waiting for data from cache or VRAM (latency bound). By itself, long scoreboard could also mean we’re approaching bandwidth limits because running out of bandwidth means a sharp increase in latency. However, low cache and memory bandwidth utilization on both GPUs points to latency being more of a factor.

Ampere's small cache and VRAM's latency are the main reasons why performance doesn't scale well enough despite the increased CUDA cores. They inevitably increased the cache size in Ada.
 
Kicks like a mule to haters :D
Kopite has good info and a valid source but Lovelace was never a "simple Ampere refresh". The changes that Lovelace carries could not be made half way into the project. I guess Kopite's issue is that he only has a high level overview of future products, which is what Nvidia shows to its partners. Something basic like number of SMs is known a year in advance by people under NDA. But SM deep dive arch and new compute/logic functions are kept very close to Nvidia engineers chest. For example Hopper transformer engine, Asynch tensor thread scheduler and DPX instructions are a close garden of very few inside Nvidia and impossible to leak before announcement.
 
Last edited:
Kopite has good info and a valid source but Lovelace was never a "simple Ampere refresh". The changes that Lovelace carries could not be made half way into the project. I guess Kopite's issue is that he only has a high level overview of future products, which is what Nvidia shows to its partners. Something basic like number of SMs is known a year in advance by people under NDA. But SM deep dive arch and new compute/logic functions are kept very close to Nvidia engineers chest. For example Hopper transformer engine, Asynch tensor thread scheduler and DPX instructions are a close garden of very few inside Nvidia and impossible to leak before announcement.

Await the launch of the new RTX hardware, thats when my quoted line will happen.
 
Kopite has good info and a valid source but Lovelace was never a "simple Ampere refresh". The changes that Lovelace carries could not be made half way into the project. I guess Kopite's issue is that he only has a high level overview of future products, which is what Nvidia shows to its partners. Something basic like number of SMs is known a year in advance by people under NDA. But SM deep dive arch and new compute/logic functions are kept very close to Nvidia engineers chest. For example Hopper transformer engine, Asynch tensor thread scheduler and DPX instructions are a close garden of very few inside Nvidia and impossible to leak before announcement.

So basically we don't know anything, even though a lot of confidential things has been leaked by hackers.
 
Kopite7kimi was spot on about initial Ampere lineup high level configurations (missed the doubling of FP32 though) but I think that his sources has gotten rather bad in everything which followed.
For Lovelace he's just guessing from some high level numbers at the moment IMO.
With chips supposedly going into tapeouts any day now there's exactly zero chance of them getting any major changes between now and the majority of 2021. It is highly likely that Lovelace lineup was feature locked back in 2020 in fact.
 
Leakers have never been all that accurate, some things are a hit some are a miss. Its not really leaking its guesswork. Remember the next-gen console speculation-topic days?
 
Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted Encodings | Research (nvidia.com)

Abstract — Pulse Amplitude Modulation (PAM) uses multiple voltage levels as different data symbols, transferring multiple bits of data simultaneously, thereby enabling higher communication bandwidth without increased operating frequencies. However, dividing the voltage into more symbols leads to a smaller voltage difference between adjacent symbols, making the interface more vulnerable to crosstalk and power noise. GDDR6X adopts four level symbols (PAM4) with Maximum Transition Avoidance (MTA) coding, which reduces the effects of crosstalk. However, current coding approaches can consume excess energy and produce excess power noise. This paper introduces novel energy reduction techniques for PAM interfaces, specifically demonstrating them for GDDR6X PAM4. Inspired by prior work on conventional single-ended I/O interfaces, we leverage the unused idle periods in DRAM channels between data transmissions to apply longer but more energy-efficient codes. To maximize the energy savings, we build multiple sparse encoding schemes to fit different sized gaps in the DRAM traffic. These sparse encodings can provide energy reductions of up to 52% when transferring 4-bit data using a 3-symbol sequence. We evaluate these coding techniques using an NVIDIA RTX 3090 baseline, a recent GPU which uses GDDR6X with PAM4 signaling. Our evaluation shows the opportunity for large energy savings at the DRAM I/O interface (28.2% on average) over many HPC/DL applications with minimal performance degradation.
 
I assume kopite is implying that AD102 is more than 18K FP32 CUDA “cores”. Maybe Nvidia doesn’t mind being memory latency bound in games as long as compute bound workloads benefit from the excessive flops. GA102 is a solid 30% faster than Navi 21 in those.

What are the latest Navi 31 rumors saying? 5120 cores per GCD?
 
I assume kopite is implying that AD102 is more than 18K FP32 CUDA “cores”. Maybe Nvidia doesn’t mind being memory latency bound in games as long as compute bound workloads benefit from the excessive flops. GA102 is a solid 30% faster than Navi 21 in those.

What are the latest Navi 31 rumors saying? 5120 cores per GCD?
Are there any compute bound games? What is the most common bottleneck for games if that can be answered.
 
As with any game so far, not many use current generation hardware capabilities.
I'm curious as to what the current bottlenecks are. Bandwidth is commonly mentioned but Nvidia GPUs have consistently scaled better with core OC as opposed to memory since at least Kepler. 3090 has 55% more compute and 83% more bandwidth while a 6900 has 30% more texel rate and 50% more pixel fill. Infinity Cache eats into the bandwidth advantage but not completely.
 
I assume kopite is implying that AD102 is more than 18K FP32 CUDA “cores”. Maybe Nvidia doesn’t mind being memory latency bound in games as long as compute bound workloads benefit from the excessive flops. GA102 is a solid 30% faster than Navi 21 in those.

What are the latest Navi 31 rumors saying? 5120 cores per GCD?

Seems to be what rumors say with Navi 31 having 3 GCD's for up to 15,360 cores.

AMD's next-gen GPUs may deliver 130% performance jump | Digital Trends

That article appears to be using Moore's Law Is Dead as its source.

Regards,
SB
 
Status
Not open for further replies.
Back
Top