Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

trinibwoy · Feb 25, 2020

Benetanegia said:
For the same reason they included the INT unit in the first place, I suppose (it's more efficient?). This move (if at all real and not fake) would simply fix the FP:INT ratio. Because, right now in Turing there's nothing to switch to, nothing to schedule to the INT pipe in 64% of cases, since there's supposedly only 36 INT per 100 FP instructions.

Per Nvidia docs one of the benefits of the INT pipe is to initiate data loads for the next iteration of a loop while the FP pipe is processing the current iteration. Initiating data loads as early as possible likely leads to more efficient use of bandwidth across the memory hierarchy especially for tight loops with a high INT:FP ratio.

As you pointed out that's less helpful in other cases where the FP:INT ratio is high. Half the available register bandwidth goes to waste as there are no execution units available to issue to every other clock.

The pipeline latency on Volta/Turing also dropped to 4 cycles from 6 on Pascal. Not sure if that was also due to splitting out the INT pipe.

CarstenS · Feb 25, 2020

I simply cannot believe that Nvidia would go back Kepler-style while at the same time keep the register file at 64 kiB per processing block (quarter-SM). That would only increase contention, forcing more data movement in and out of registers, increasing power and blocking execution of more complex workloads. I call thus fake.

DegustatoR · Feb 25, 2020

trinibwoy said:
Ah, now I get it. Yeah that would be interesting and a relatively cheap way to increase FP throughput. It begs the question though of why go through all that trouble instead of just using Pascal style 32-wide combined INT+FP units.

What if it's actually 16 INT + 16 INT/FP32 + 16 FP32?

w0lfram · Feb 25, 2020

DavidGraham said:
Take that Navi die, put it at 16nm and come back and ask the same question.

The 2070 also supports a whole lot more of functional units, Tensor cores, Ray Tracing cores, INT32 units ..etc, which means it does more functions at the same transistor budget as Navi, and with an older process.

TU-106 gets beat by Navi-10 in Games & by 20% in some instances. Subsequently, Navi-10 in many instances, beats the 2070 SUPER, too... (cut-down TU-102)

Turing can't compete with rdna, how will it compete with rdna2..?

Rootax · Feb 25, 2020

w0lfram said:
TU-106 gets beat by Navi-10 in Games & by 20% in some instances. Subsequently, Navi-10 in many instances, beats the 2070 SUPER, too... (cut-down TU-102)

Turing can't compete with rdna, how will it compete with rdna2..?

Please stop.

2070 super is most of the time better than 5700 xt : https://www.techspot.com/review/1902-geforce-rtx-2070-super-vs-radeon-5700-xt/

DegustatoR · Feb 25, 2020

Navi 10 doesn't beat TU106/2070 either: https://www.techpowerup.com/review/amd-radeon-rx-5700-xt/28.html

What it does however is lack all new features of Turing and run at 30% more power while using a whole node advantage compared to TU106.

It is in fact RDNA1 which can't compete with Turing and this is precisely the reason why AMD is selling these cards with a discount against corresponding Turing parts.

But whatever. I feel that this is a pointless discussion.

DavidGraham · Feb 28, 2020

Two unknown new NVIDIA GPUs:

7552 Cuda cores (118 CUs) > 1.11GHz core clock > GB5 Compute score: 184096 (Open CL)
6912 Cuda cores (108 CUs) > 1.01GHz core clock > GB5 Compute score: 141654 (Open CL)

results are from Oct 2019, so probably engineering samples, which could explain the low clocks.

For some context :
GV100 : 142837 (Open CL)
Tesla V100 : 154606 (Open CL)
Titan RTX : 132804 (Open CL)

https://twitter.com/x/status/1233419594104262656

Kaotik · Feb 28, 2020

New datacenter part, as no consumer product would have that kind of amounts of memory

DavidGraham · Feb 28, 2020

Yes, thought that's obvious from the focus on compute loads. But anyhow the Datacenter GPUs from NVIDIA are usually strongly related to the consumer ones.

CarstenS · Feb 28, 2020

Kaotik said:
New datacenter part, as no consumer product would have that kind of amounts of memory

Like Big Navi's 24 GByte?

Kaotik · Feb 28, 2020

CarstenS said:
Like Big Navi's 24 GByte?

You mean the image which spread like wildfire despite having impossible specs and caused SK Hynix to actually make a press release about it being fake?

DavidGraham · Feb 28, 2020

Titan RTX scores 128509
118CU part scores 184096

https://browser.geekbench.com/opencl-benchmarks

The 118CU part is 43% faster than Titan RTX while working at almost half the clocks (1100MHz), assuming perfect clock scaling @1800MHz, that means the 118CU part can beat Titan RTX by 2.34X in this workload!

If we take into account a non perfect clocks scaling we can safely expect at least 70% faster than Titan RTX in gaming workloads, or maybe more?

Digidi · Feb 28, 2020

This is big chip only for hpc. My guessing it’s not getting rt cores. Also gaming chip will not be 850mm^2. Also I think this chip is not going to 1800 MHz if it’s this size. It will stay maximum 1.4-1.6 ghz

DavidGraham · Feb 28, 2020

Digidi said:
New This is big chip only for hpc. My guessing it’s not getting rt cores. Also gaming chip will not be 850mm^2. Also I think this chip is not going to 1800 MHz if it’s this size. It will stay maximum 1.4-1.6 ghz

Gaming chips shed FP64 units which reduces die size, they can also get leaner by reducing Tensor cores count, or internal caches/registers. They could also swap HBM for GDDR6 .. repeating the situation of GP100 (600mm) vs GP102 (471mm).

Bondrewd · Feb 29, 2020

DavidGraham said:
Titan RTX scores 128509
118CU part scores 184096

https://browser.geekbench.com/opencl-benchmarks

The 118CU part is 43% faster than Titan RTX while working at almost half the clocks (1100MHz), assuming perfect clock scaling @1800MHz, that means the 118CU part can beat Titan RTX by 2.34X in this workload!

If we take into account a non perfect clocks scaling we can safely expect at least 70% faster than Titan RTX in gaming workloads, or maybe more?

It's kiss the reticle chip, so it better be 70% faster or else.

DavidGraham said:
Gaming chips shed FP64 units which reduces die size, they can also get leaner by reducing Tensor cores count, or internal caches/registers. They could also swap HBM for GDDR6 .. repeating the situation of GP100 (600mm) vs GP102 (471mm).

Don't forget NVLink and all that other bulky analog jazz that eats area.

Deleted member 2197 · Feb 29, 2020

It's likely Quadro cards. With 24 gb and 48 gb of memory, these unknown cards match memory configuration of the Quadro RTX 8000 and Quadro RTX 6000.

Bondrewd · Feb 29, 2020

pharma said:
It's likely Quadro cards. With 24 gb and 48 gb of memory, these unknown cards match memory configuration of the Quadro RTX 8000 and Quadro RTX 6000.

Or it's 6*4/8GB HBM2s.

CarstenS · Feb 29, 2020

Kaotik said:
You mean the image which spread like wildfire despite having impossible specs and caused SK Hynix to actually make a press release about it being fake?

No, I mean the other rumor that was spread after that.

Apart from that: What amount do you think the next XBox and Playstation will have? Most likely, they will not stay at 8 GByte and for high-end Desktop, you need something more than „just what consoles have“ in order to cater to their target audience, aka PC Gaming Master Race. Otherwise, they'd feel diminished.

DegustatoR · Feb 29, 2020

DavidGraham said:
assuming perfect clock scaling @1800MHz

That would be a pretty big assumption at this point.

DavidGraham · Feb 29, 2020

DegustatoR said:
That would be a pretty big assumption at this point.

Yeah, it's just for theoretical analysis, never the less .. what do you think a more realistic analysis of the situation (clock scaling, TDP, gaming chips .. etc) would be?

Nvidia Post-Volta (Ampere?) Rumor and Speculation Thread

trinibwoy

Meh

CarstenS

Moderator

DegustatoR

w0lfram

Rootax

DegustatoR

DavidGraham

Kaotik

Drunk Member

DavidGraham

CarstenS

Moderator

Kaotik

Drunk Member

DavidGraham

Digidi

DavidGraham

Bondrewd

Deleted member 2197

Guest

Bondrewd

CarstenS

Moderator

DegustatoR

DavidGraham

Similar threads