Nvidia Ampere Discussion [2020-05-14]

The 3090 is an entirely different class of card. It might require some insane cooling if their thought process was, "What's the most performance we can fit onto a card if we max out the pcie power delivery plus two power connectors. But I don't think that would tell us much about the architecture. It would just be cooling based on their TDP.
My first thought was "Why should EVGA and K|ngP|n have all the fun?"
 
And if you look at the progression between that through Turing you may see why it can be a 3 slot top end now, without any mystical not so good reasons.
0, [no show], 0, 1A, 1A, 1A, 1A, 1A, 1A, 2B, 2B, 2A, 2B, 2B, 2B, 2B, 2B, 2B, 2B, 2B, 2B, 2A, 3A? Yeah, sure seeing a progression, but no pattern.

But nVidia has changed the cooler from the radial design to the two axial design. They did a "radical" decision otherwise the Titan RTX would just be as load as a jet.
Did you check Quadro RTX 8000?

I'm having a very hard time believing that Nvidia totally fucked their power power draw advantage over AMD while doing a node shrink. 5700xt drew almost as much power as a 2080FE, and Turing was 12nm while 5700xt was 7nm. I know people are saying Samsung 8nm isn't great, but it seems very weird to me that a 2080FE would be 210W max and suddenly the 3080 is some kind of thermal monstrosity even with a node shrink that should give some advantages.
Possible explanation: They had to up the clocks considerably in the last minute, going way outside of the optimal curve. Because RDNA2.

It's Moore's Law is Dead nonsense by origin. Just think how much bandwidth inside the chip it would take to pass everything through tensors in addition to all the other traffic, and for what I've heard (don't really have clue but I think the guy who said it does) tensors aren't even really suited for compression/decompression
And this surely isn't just a fancy name for Ampere's sparsity feature?
 
Last edited:
Is the top top one in the right shot supposed to be a 3090? Looks kinda like it, maybe they're different bins of the same board as they look the same size underneath the cooler. One with 20gb of fast ram, the other 10gb of slower ram and disabled cores/lower clocks, thus the smaller cooler?
The bigger one is 3090 and it's supposedly 24 GB.
The smaller one is 3080 and it's supposedly 10 or 20 GB.
 
The 3090 is an entirely different class of card. It might require some insane cooling if their through process was, "What's the most performance we can fit onto a card if we max out the pcie power delivery plus two power connectors. But I don't think that would tell us much about the architecture. It would just be cooling based on their TDP.

That's the optimistic view. The question is why would they do this and why now? The only sensible reason is that AMD is nipping at their heels.
 
I still don't understand what's so insane in a 2.5-3 slot AC with two fans? Most AIB cards use such designs in the top end, and considering that NV has moved their reference designs into AIB custom territory with Turing this seem like a logical next step for the halo card.

I agree though that they wouldn't have done this without a reason but just assuming that this reason is RDNA2 seems fragile at best. The biggest issue of Turing was its lack of perf/price gains compared to Pascal. It's very possible that they want to avoid this with Ampere and are ready to push the cards higher for them to be a good upgrade option even for Turing owners.

And we still don't really know what gaming Ampere is. If it will double the FP32 rate per SM then this alone can result in huge increases in power consumption - but will result in doubling of flops as well.
 
If they should be good upgrades then you should have also to propose them in a way to cause the less headaches to the end user. Having a huge card with not trivial power requirements and adding exotic cabling on the top is not exactly the best way to do so, as you may force end users to change power supply, and even case as well as graphic card. Then of course users buying a 1000$ + GPU may or may not be willing to spend for these side upgrades. Yes, they may have been doing so for other reasons than increasing competition from AMD. But this does not mean that increasing power consumption (on a smaller process...) is a good sign, and this is valid for lower end parts as well.
If this is not due to increased competition by AMD; then the only reason would be to give these cards enough performance delta compared to te previous generation to justify an upgrade. But, as told, if really the rumors about power consumption are confirmed (and the signs are there), considering the smaller production process, this is not exactly positive for the performance/watt metric.
 
Yeah I don't really believe it's linked to rdna2. It can be kind of the opposite. They are dominating for so many years now at the top end, they can do what they want, really, and charge a lot for it. If this thing is kind of quiet, they bypass the need of an aftermarket cooler, and sell it around titan price...

Or, yeah , the samsung xx process (if it's that) is f*** for big chip. But in that case I don't see nVidia make a big tease like they did a few days ago.
 
Having a huge card
3090 only with a price of $1000+. Doubtful that those who buy these cards have issues with upgrading their cases and PSUs as often as they want.

with not trivial power requirements
What does that even mean?

and adding exotic cabling on the top
Like 90% of high and top end cards are using 2x8 pin cables since I dunno Fermi? Now you put these into an adapter which goes into the card. Very "exotic" indeed.

But this does not mean that increasing power consumption (on a smaller process...) is a good sign, and this is valid for lower end parts as well.
What of the above are you expecting to be valid for low end parts? We already know from the leaks that the size isn't valid even for 3080.

But, as told, if really the rumors about power consumption are confirmed (and the signs are there), considering the smaller production process, this is not exactly positive for the performance/watt metric.
There are two numbers in the performance/watt metric.
 
Even if users willing to pay 1000$+ for a card may afford also a PSU and a case it is not a good thing they are forced to do so if they want to upgrade. And this adds to the "real" price of such a solution in the case someone wants to upgrade.
Btw. with "not trivial power requirements" I mean simply high power consumption. No manufacturer would increase the BOM over the minimum required only for "keeping the card as much as silent as possible".
The cabling is "exotic": it is not a standard cable and it is not used on pratically all the PSU out there. You can say the adapter goes with the card - fact is, it is not a standard solution with all the downs (i.e. reperibility in case of failure).
The rumors about the power consumption are not out there only for the top end part. And - same architecture out at the same tie means sharing the same pro and the same issues.
Yes, there are two numbers in the performance/watt metric. Except that we don't know performance but we have big hints about power draw.
 
The GTX980 was 165W, GTX 1080 was 180W. RTX 2080 was 215W. So RTX 3080 is going to be well below 300W, otherwise we're seeing the most massive jump in a class of cards that Nvidia has done ever (or at least a long time). I would guess 250-270W. 270W would be a lot for a 3080. 250W-270W would put it in the range of a 2080ti for power.
 
I agree though that they wouldn't have done this without a reason but just assuming that this reason is RDNA2 seems fragile at best. The biggest issue of Turing was its lack of perf/price gains compared to Pascal. It's very possible that they want to avoid this with Ampere and are ready to push the cards higher for them to be a good upgrade option even for Turing owners.

Exotic cooling and power delivery doesn’t give me great confidence that Nvidia is prioritizing perf/$. It’s not impossible but I think it’s super optimistic to think that Ampere is power efficient and Nvidia also decided to max out power at the same time.

If I was a betting man I would wager Ampere is struggling against either Turing or RDNA2 and Nvidia was forced to push the envelope to create some separation.
 
If I was a betting man I would wager Ampere is struggling against either Turing or RDNA2 and Nvidia was forced to push the envelope to create some separation.
I agree that them pushing the envelope could mean that RDNA2 will be very close in performance.
What I don't agree with however is the "exotic cooling and power requirements" lines as I don't see anything exotic in what was leaked so far.
Also if Nv is pushing the envelope we can be damn sure that AMD will push these on their side too - if Navi 21 will be able to beat GA102 under some similarly heavy cooling they sure as hell will use it.
As I've said this will be an interesting autumn. There are many things outside of pure flops and 3DMark numbers to look into.
 
I really doubt Ampere is less power efficient than Turing, especially with a node shrink. I can believe that RDNA2 has narrowed the gap.

At like-for-like it should obviously be more efficient. Nvidia isn't going to regress there. When it comes to actual products, it's all about binning. You can even make Bulldozer more efficient than Zen depending on the voltage and frequency you set for each. If the end-products somehow regress in efficiency vs Turing, it won't be because Nvidia wanted to bin their products that way, but out of the necessity of a drastically more competitive environment.
 
The GTX980 was 165W, GTX 1080 was 180W. RTX 2080 was 215W. So RTX 3080 is going to be well below 300W, otherwise we're seeing the most massive jump in a class of cards that Nvidia has done ever (or at least a long time). I would guess 250-270W. 270W would be a lot for a 3080. 250W-270W would put it in the range of a 2080ti for power.
Problem with this comparison is that the GTX 980 & 1080 where based on mid/performance range dies. Nvidia started to shove those dies into a higher tier with the GTX 680 which was the GK104 than the higher end GK110 that came out later on.
I really doubt Ampere is less power efficient than Turing, especially with a node shrink. I can believe that RDNA2 has narrowed the gap.
I think there's a distinct possibility than RDNA 2 on TSMC 7e will be more efficent than Ampere on Samsung 8nm (if there's TSMC 7nm Ampere maybe Ampere is more efficent Apples to Apples) if there's only rasterization, maybe it could be more "efficent" if Raytracing (assuming Ampere has far stronger RT capabilites than RDNA 2) & DLSS assuming AMD isn't able to piggy back onto that is utilized.

Now my biggest question is if Nvidia will do a 12GB GA102 as a 3080Ti, that rumored listing of 800 USD dollars for the 3080 & 1400 USD (could be cheaper probably). If a 80CU Navi @ 2Ghz or more is at 1000 USD or even less and outperforms the 3080 by qutie a bit and another at 72 CUs with similar clocks is on par at 700-800 dollars but way more power efficent. Would Nvidia bring out a 3080 Ti that is all intents and purposes a 3090 with half of its VRAM removed? I mean rogame on twitter did list a 12GB before (and IIRC few like Adored TV and MILD have mentioned a 12GB SKU in the past). Would be sensible to have a 12GB SKU in reserve to counter Navi 21 just in case.
 
I really doubt Ampere is less power efficient than Turing, especially with a node shrink. I can believe that RDNA2 has narrowed the gap.

I mean, Nvidia would have to sink pretty low to suddenly produce a less power efficient card. The improvement from TSMC 12nm to Samsung 8nm doesn't seem that huge in terms of power savings, maybe 25% more power efficient(?) but I'd expect at least that much.

Still, it does bring up the prospect that the upcoming cards will be TDP limited. Hypothetically the switch to GDDR6x offers up the possibility of 50% increase in bandwidth over a Titan RTX. But so far all we've seen of Ampere officially is improvements in Deep Learning and a reduction in INT silicon. Not even the rumors mention anything in particular about improved energy efficiency.

Edit- You know, maybe that's what the new power cables are for. 24gb 30XX at 400 watts, require new PSU?
 
Last edited:
What fraction of board power is consumed by memory? If bandwidth is increasing by 50%, and GDDR6X uses only 15% less energy per bit transferred than GDDR6, memory power consumption will increase by almost 30%.
 
I mean, Nvidia would have to sink pretty low to suddenly produce a less power efficient card. The improvement from TSMC 12nm to Samsung 8nm doesn't seem that huge in terms of power savings, maybe 25% more power efficient(?) but I'd expect at least that much.

Still, it does bring up the prospect that the upcoming cards will be TDP limited. Hypothetically the switch to GDDR6x offers up the possibility of 50% increase in bandwidth over a Titan RTX. But so far all we've seen of Ampere officially is improvements in Deep Learning and a reduction in INT silicon. Not even the rumors mention anything in particular about improved energy efficiency.

I'm just speculating but I've been wondering whether or not the power consumption figures being mentioned are in part due to a difference in measuring and design criteria. For Turing has there been any deep looks at how much utilization differences for the RT cores, Tensor cores, or even FPU/ALU concurrency affects power consumption? As just in theory you'd think using the RT cores vs not would have an affect on utilization and therefore consumption.

What does this have to do with Ampere? I'd wonder if Nvidia is looking to further push possible concurrency and combined with expected workloads (eg. greater RT adoption) needing to adjust specification parameters based on that. If we're looking at a design with essentially 4 separate types of cores that could have extremely variable utilization rates depending on the workload the question of power consumption seems like it'd be much more complicated than just a single unified FPU/ALU pipeline with much higher scenario variation.

250w-300w might have been the more "ideal" spec based on the more conventional (traditional?) combined FPU/ALU pipeline but if we have separate FPUs, ALUs, Tensor cores, RT cores all executing concurrently? If you keep the same power limits in place than in a workload that does utilize all 4 wouldn't you effectively be leaving FPU/ALU performance on the table? So you design into spec a higher limit for the Tensor/RT cores, but what happens when those aren't being used? You're design spec already allows for a higher power usage, do you then let FPU/ALUs go into it even if it's diminishing returns?

Anyways that my ramblings. We'll have to see with what gets announced and end hardware but I'm wondering in general if how we look at power consumption and efficiency will be more complicated going forward.

What fraction of board power is consumed by memory? If bandwidth is increasing by 50%, and GDDR6X uses only 15% less energy per bit transferred than GDDR6, memory power consumption will increase by almost 30%.

I believe 16Gbps GDDR6 is roughly 2.5w a chip.
 
Back
Top