Nvidia Ampere Discussion [2020-05-14]

JasonLD · Aug 15, 2020

Bondrewd said:
Oh no, no.
Not even close.
First consoles in what, decades? with a real CPU.

It doesn't matter. It isn't like Xbox/PS5 is going to turn into full fledged Windows 10/Linux machine. It is still going to be playing games/streaming media contents.
There is nothing about next generation consoles that will disrupt anything beyond console gaming market.

Benetanegia · Aug 15, 2020

trinibwoy said:
Could be they added back FP support to the INT pipe so you can issue either INT + FP or 2xFP per clock.

That makes the most sense to me. I had already thought about it several months back, when the 2x FP32 rumor started and I think it seems the most efficient way of doing it without increasing Register size/bandwidth or wrecking it (Kepler). Plus, 10k FP32 + 5k INT32 units sound like a ridiculously high number. 10k where half of them will share FP and INT ops seems more reasonable.

Edit: Either way fun times ahead with TFLOP reporting. lol

DavidGraham · Aug 15, 2020

Benetanegia said:
3090 being dual GPU doen't require any special explanation. Ergo, Occam's razor.

It's not a dual GPU, NVIDIA dumped those a long time ago, along side their SLi initiative, I have a strong feeling it's something far more sinister, given NVIDIA's marketing push and comparison to the first industry GPU.

This whole launch screams secrets and surprises, first GDDR6X out of the blue, then a sudden imminent launch, now 2X FP32 units, and who knows what's there to uncover?

Benetanegia · Aug 15, 2020

I didn't say that it is dual-GPU, yes or yes, based on the points I argued*. And I certainly didn't say or think of SLI. Maybe the surprise is they finally figured out multi-GPU rendering. I mean that they must have because Hopper is supposed to be chiplet design, but maybe they can actually do it now, and that is the surprise.

*I don't think you guys understand what Occam's Razor means:

https://en.wikipedia.org/wiki/Occam's_razor

Kaotik · Aug 15, 2020

Benetanegia said:
It's 3080 anyway, and supposedly the PCB ends in an V before the second fan, so I concede I was wrong there.

EDIT: And no, the ones that I was talking about didn't have any shroud, they were only fins with heatpipes and maybe a vapor chamber iirc. They were supposedly pics taken from the fab.

This one, it's the same cooler without shroud, fans or pcb

Malo · Aug 15, 2020

I think it's rather disappointing that the apparent halo 3090 has only 12Gb of VRAM.

Benetanegia · Aug 15, 2020

Kaotik said:
This one, it's the same cooler without shroud, fans or pcb
View attachment 4464

That's the one. In my mind the PCB was sandwitched in between, somehow. But like I said, that cooler is for 3080 anyway as per your other pic and goes alongside a V shaped PCB which barely has space for 1 GPU let alone 2. So I was wrong on that.

ShaidarHaran · Aug 15, 2020

Malo said:
I think it's rather disappointing that the apparent halo 3090 has only 12Gb of VRAM.

You and me both. How many years is it now with 12GB at the top of the product stack? Yeah, I know the Titan RTX has 24GB but I don’t consider a $3000 GPU a consumer product.

Also, I can’t believe the GDDR6X rumor was true. Sounded like complete bs. I hope the double FP throughput rumor turns out to be true. 2080 Ti going up on ebay!

troyan · Aug 15, 2020

Kaotik said:
Well, those 18 Gbps don't even seem to be available, since everyone is using 14 (and one card 16) Gbps, but apparently these 19 and/or 21 Gbps chips are actually available.
But it is indeed curious how they could keep this under the wraps, and if JEDEC wasn't involved, how will it react to one memory manufacturer going solo and using their naming convention?

Didnt they do the same with GDDR5X? It was only accessible and used by nVidia.

trinibwoy · Aug 15, 2020

troyan said:
Didnt they do the same with GDDR5X? It was only accessible and used by nVidia.

You're right I never noticed that. Polaris was GDDR5 and all the high end stuff after that was HBM.

Bondrewd · Aug 15, 2020

troyan said:
Didnt they do the same with GDDR5X? It was only accessible and used by nVidia.

No, G5X was an actual JEDEC spec.

DegustatoR · Aug 15, 2020

trinibwoy said:
Could be they added back FP support to the INT pipe so you can issue either INT + FP or 2xFP per clock.

Do we have any idea on how FP64, FP32 and INT32 are scheduled on GA100? Can they all run in parallel?

For reference:

troyan · Aug 15, 2020

A100 can schedule two FP16 vec2 operations concurrently on the FP32 and TensorCores.

DegustatoR · Aug 15, 2020

troyan said:
A100 can schedule two FP16 vec2 operations concurrently on the FP32 and TensorCores.

It can? Turing used TCs for all FP16 math AFAIK, there were no FP16 capability in main FP32 SIMDs. It also can't run FP32+INT32 and TCs concurrently.

Samwell · Aug 15, 2020

DegustatoR said:
It can? Turing used TCs for all FP16 math AFAIK, there were no FP16 capability in main FP32 SIMDs. It also can't run FP32+INT32 and TCs concurrently.

Yes, because of that A100 has a 4xFP16 rate compared to FP32. Could you maybe use the normal FP32 and somehow FP32 from the tensor cores to double the theoretical throughput?

Qesa · Aug 15, 2020

Samwell said:
Yes, because of that A100 has a 4xFP16 rate compared to FP32. Could you maybe use the normal FP32 and somehow FP32 from the tensor cores to double the throughput?

The tensor core has double the throughput of Volta all on its own; vector fp16 also doubling isn't surprising.

Bondrewd · Aug 15, 2020

Samwell said:
somehow FP32 from the tensor cores to double the theoretical throughput?

Isn't FP32 from A100 TCs a non-IEEE one?

Qesa · Aug 15, 2020

Bondrewd said:
Isn't FP32 from A100 TCs a non-IEEE one?

Yeah, missing 13 bits of mantissa

DegustatoR · Aug 15, 2020

Samwell said:
Yes, because of that A100 has a 4xFP16 rate compared to FP32.

This rate is maintained on all TC precision modes though which means that it's not coming from FP32 SIMDs, no?

I see two possibilities for gaming Ampere here:

1. Double width FP32 SIMDs which will likely lead to a double width of INT32 SIMD as well. They've done this previously between GP100 and GP10x.

2. A second 16-wide FP32 SIMD in place of the FP64 one of GA100. But for that to work well they'll need to be able to schedule FP32+FP32+INT32 or it will be either FP32+FP32 or FP32+INT32 per clock which will result in utilization issues.

troyan · Aug 15, 2020

nVidia responded on their dev blog but the comments are not visible anymore. CarstenS has copied the response: https://forum.beyond3d.com/posts/2128606/

Nvidia Ampere Discussion [2020-05-14]

JasonLD

Benetanegia

DavidGraham

Benetanegia

Kaotik

Drunk Member

Malo

Yak Mechanicum

Benetanegia

ShaidarHaran

hardware monkey

troyan

trinibwoy

Meh

Bondrewd

DegustatoR

troyan

DegustatoR

Samwell

Qesa

Bondrewd

Qesa

DegustatoR

troyan

Similar threads