NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
I'm curious about the bandwitdh situation... I don't doubt they can increase raw power, but memory bandwitdh for gamers ?

A nVinfinity cache ? Wider bus (too costly imo ?) ? Or "just" new innovations on compression, culling, tiling, data path,... ?
 
I have no idea about Lovelace versus Hopper. So I'm wondering if Hopper is solely a data centre GPU and so Lovelace is consumer/prosumer?
RedGamingTech said that he heard that Lovelace isn't consumer, overall the near future could just be "Ampere Next" for next year and then "Ampere Next Next".
I'm curious about the bandwitdh situation... I don't doubt they can increase raw power, but memory bandwitdh for gamers ?

A nVinfinity cache ? Wider bus (too costly imo ?) ? Or "just" new innovations on compression, culling, tiling, data path,... ?
I think a AD102/GA202 could have a 384-bit bus with a 192 MB cache, not sure if GDDR6X (hopefully not after the hot and hungry SKUs it's with) or GDDR7 (doable if Computex the earliest) at 24Gbps, would be able to feed the 2-2.3x rumors we have about the top RTX 40 SKUs. Overall I think the RTX 40 lineup could look like this:


AD102/GA202 SKUs

RTX Titan II:

o 144 SMs.
o 384 Bit Bus.
o 48 GB of GDDR*
o 2000-2500 USD

RTX 4090:

o 140 SMs.
o 384 Bit Bus.
o 24 GB of GDDR
o 1200 USD

RTX 4080:

o 116 SMs.
o 384 Bit Bus.
o 20 GB of GDDR (slower than 4090/Titan 2)**
o 800 USD


AD104/GA204 SKUs

RTX 4070:

o 82 SMs.
o 257 Bit Bus.
o 16 GB of GDDR
o 600 USD

RTX 4060:

o 64 SMs.
o 224 Bit Bus.
o 14 GB of GDDR (slower than 4070)**
o 450 USD

AD106/GA206 SKUs

RTX 4050:

o 46 SMs.
o 192 Bit Bus.
o 12 GB of GDDR
o 350 USD

RTX 4040:

o 34 SMs.
o 160 Bit Bus.
o 10 GB of GDDR (slower than 4050)**
o 250 USD

*RTX Titan II would be using double sided memory ala 3090, but I think Nvidia will want to bring back the Titan as the prosumer they can jack up the cost to whatever they want.

**4080/4060/4040 would have slower memory as it makes it easier for Nvidia to segment the line up in terms of performance and space SKUs. No Tis as Nvidia originally wanted for RTX 30 but AMD forced their hand.
 
Assuming TSMC 5nm I can see this being twice as fast as high-end Ampere. Clock boost and better perf / watt would go a long way even without the architectural improvements.
 
Assuming TSMC 5nm I can see this being twice as fast as high-end Ampere. Clock boost and better perf / watt would go a long way even without the architectural improvements.
Assume SEC 4LPP which is very close to TSMC 5P (much more competitive than previous round SEC 8nm vs TSMC 7nm)
 
Are Nvidia seriously going to name a chip after a porn star ?
Edit:
Since someone will ask
PxCsYgk.jpg
 
Last edited:
The three options Nvidia has for bandwidth are: HBM (expensive). Go back to GDDR6, which is now as fast as GDDR6x, and implement a 512bit bus. And go with a big LLC like AMD has done.

None of them are ideal of course; a 512bit bus has its own costs; as does a big LLC as we can see from RDNA2 having a deficit in deferred titles at high resolutions (wonder what effect a visibility buffer has on that). Regardless, for performance increases they're also heavily constrained by power. Even jumping from Samsung 8nm to TSMC 5nm there isn't enough power savings to double performance, so unless there's a major architecture shift simultaneously that notion is out.

As a guess, lower end of expectations is a switch to a big LLC and a 25% performance improvement (averaged) with a switch to GDDR6 to save money. But better is always possible.
 
Obviously this isn't coming to Lovelace, but with PCIe 6.0 bringing 128GB/s duplex (256GB/s raw) and DDR5 hitting ~100GB/s in a 2-module configuration, I wonder if we'll start seeing DRAM-less midrange GPUs that only bring large amounts of stacked LLC.
 
Obviously this isn't coming to Lovelace, but with PCIe 6.0 bringing 128GB/s duplex (256GB/s raw) and DDR5 hitting ~100GB/s in a 2-module configuration, I wonder if we'll start seeing DRAM-less midrange GPUs that only bring large amounts of stacked LLC.

Wouldn't that make the GPU performance too much dependent on the rest of the system? Sounds like a PR nightmare.
 
Wouldn't that make the GPU performance too much dependent on the rest of the system? Sounds like a PR nightmare.

Is it that much different to being able to pair a RTX3090 with a 2-core/2-thread Celeron using the motherboard's chipset-driven PCIe 3.0 x4 slot that is usually full sized?
Or pairing a 5700G APU with just one module of DDR4-1600?
 
Status
Not open for further replies.
Back
Top