Nvidia Ampere Discussion [2020-05-14]

Oh no, no.
Not even close.
First consoles in what, decades? with a real CPU.

It doesn't matter. It isn't like Xbox/PS5 is going to turn into full fledged Windows 10/Linux machine. It is still going to be playing games/streaming media contents.
There is nothing about next generation consoles that will disrupt anything beyond console gaming market.
 
Could be they added back FP support to the INT pipe so you can issue either INT + FP or 2xFP per clock.

That makes the most sense to me. I had already thought about it several months back, when the 2x FP32 rumor started and I think it seems the most efficient way of doing it without increasing Register size/bandwidth or wrecking it (Kepler). Plus, 10k FP32 + 5k INT32 units sound like a ridiculously high number. 10k where half of them will share FP and INT ops seems more reasonable.

Edit: Either way fun times ahead with TFLOP reporting. lol
 
3090 being dual GPU doen't require any special explanation. Ergo, Occam's razor.
It's not a dual GPU, NVIDIA dumped those a long time ago, along side their SLi initiative, I have a strong feeling it's something far more sinister, given NVIDIA's marketing push and comparison to the first industry GPU.

This whole launch screams secrets and surprises, first GDDR6X out of the blue, then a sudden imminent launch, now 2X FP32 units, and who knows what's there to uncover?
 
I didn't say that it is dual-GPU, yes or yes, based on the points I argued*. And I certainly didn't say or think of SLI. Maybe the surprise is they finally figured out multi-GPU rendering. I mean that they must have because Hopper is supposed to be chiplet design, but maybe they can actually do it now, and that is the surprise.

*I don't think you guys understand what Occam's Razor means:

https://en.wikipedia.org/wiki/Occam's_razor
 
It's 3080 anyway, and supposedly the PCB ends in an V before the second fan, so I concede I was wrong there.

EDIT: And no, the ones that I was talking about didn't have any shroud, they were only fins with heatpipes and maybe a vapor chamber iirc. They were supposedly pics taken from the fab.
This one, it's the same cooler without shroud, fans or pcb
upload_2020-8-15_4-7-13.png
 
I think it's rather disappointing that the apparent halo 3090 has only 12Gb of VRAM.
 
I think it's rather disappointing that the apparent halo 3090 has only 12Gb of VRAM.

You and me both. How many years is it now with 12GB at the top of the product stack? Yeah, I know the Titan RTX has 24GB but I don’t consider a $3000 GPU a consumer product.

Also, I can’t believe the GDDR6X rumor was true. Sounded like complete bs. I hope the double FP throughput rumor turns out to be true. 2080 Ti going up on ebay!
 
Well, those 18 Gbps don't even seem to be available, since everyone is using 14 (and one card 16) Gbps, but apparently these 19 and/or 21 Gbps chips are actually available.
But it is indeed curious how they could keep this under the wraps, and if JEDEC wasn't involved, how will it react to one memory manufacturer going solo and using their naming convention?

Didnt they do the same with GDDR5X? It was only accessible and used by nVidia.
 
Could be they added back FP support to the INT pipe so you can issue either INT + FP or 2xFP per clock.
Do we have any idea on how FP64, FP32 and INT32 are scheduled on GA100? Can they all run in parallel?

For reference:

nvidia-ga100-sm-streaming-multiprocessor.jpg
 
It can? Turing used TCs for all FP16 math AFAIK, there were no FP16 capability in main FP32 SIMDs. It also can't run FP32+INT32 and TCs concurrently.

Yes, because of that A100 has a 4xFP16 rate compared to FP32. Could you maybe use the normal FP32 and somehow FP32 from the tensor cores to double the theoretical throughput?
 
Yes, because of that A100 has a 4xFP16 rate compared to FP32. Could you maybe use the normal FP32 and somehow FP32 from the tensor cores to double the throughput?

The tensor core has double the throughput of Volta all on its own; vector fp16 also doubling isn't surprising.
 
Yes, because of that A100 has a 4xFP16 rate compared to FP32.
This rate is maintained on all TC precision modes though which means that it's not coming from FP32 SIMDs, no?

I see two possibilities for gaming Ampere here:

1. Double width FP32 SIMDs which will likely lead to a double width of INT32 SIMD as well. They've done this previously between GP100 and GP10x.

2. A second 16-wide FP32 SIMD in place of the FP64 one of GA100. But for that to work well they'll need to be able to schedule FP32+FP32+INT32 or it will be either FP32+FP32 or FP32+INT32 per clock which will result in utilization issues.
 
Back
Top