Nvidia's 3000 Series RTX GPU [3090s with different memory capacity]

Who does the binning for those, NVidia or the vendor?
The vendor, that SKU was never officially unveiled.

Does the 500W version need new firmware? Or is the built-in boost capability of the processor enough to let it freewheel up to 500W?
The latter. There is a 250w version as well, but it's boost is limited by power.

Was there a liquid cooled V100 DC SKU?
Never heard of any.
 
Last edited:
Liquid cooling is pretty normal in data centre applications.

PowerPoint Presentation (hpcuserforum.com)


Quoted simply because you joined the one-liner gang. Shame.

Not pretty normal...and even more rare in the hosts.
More and more are using liquid cooling for the FACILTY (datacenters)...but they are a still a minority, nowhere near the normal.

The is a MAJOR difference with liquid cooling a datacenter...and the water-cooling your home PC.

They are worlds apart FYI.
 
300 to 500w
We would prefer it if you actually know your info before spewing it like that.

SXM is not a power consumption figure, it's a max design factor limit designed to accommodate different things.

For example A100 40GB is the same 400w as the A100 80GB.

And the A100 SXM4 80GB is 400w by the way, not 500w, which is still smaller than the final form of V100 SXM3 which was 350w initially then got to 450w in the end.
 
I don't know, to be honest. I don't understand why you're asking.
Because the comparison have to be apples to apples if we're looking at wattage between Volta and Ampere.
If there were no V100 LC SKU then what's the point of comparing such Ampere SKU with Volta?
As I've said I'm sure you can hit 1000W with some LN on any of these GPUs. Doesn't tell us much beyond the fact that the market may be willing to buy such products these days.
 
It doesn't clock low, it reaches the same peak clocks, but the sustainable frequency drops depending on the load.
That's my point, in your example it uses more power, so at sustained workloads it clocks lower.

This is why chips are binned, those that use more power at a given clock are categorised for less demanding SKUs. No matter how good TSMC's process is, not all chips will be binned identically.
 
I don't know, to be honest. I don't understand why you're asking.
Is LC maybe more common for servers not equipped with accelerators? I think LC is not standard yet. It's not total alien tech in Datacenter, but not commonplace either.
 
Cray-1 - Wikipedia

Key part of the design was the integrated liquid cooling :)

Cray-1-p1010221.jpg


The history of liquid cooled computers is long, Further back than this, but the Cray 1 is an icon.

You could say it's the last resort to use liquid cooling in a data centre, which is why it seems "abnormal". But there have been many periods over the past 60 years where it was crucial in getting something working. These were commercial systems, too, not just built for government laboratories and universities.

I think it's fair to say "accelerator cards" make it much more difficult to implement liquid cooling. Immersion cooling has been a thing for decades, too, which is a nice work-around for the physical difficulties associated with plumbing.

Google Brings Liquid Cooling to Data Centers to Cool Latest AI Chips (datacenterknowledge.com)

When competitors have been doing liquid cooling for a while, you gotta catch up, I suppose.
 
Is LC maybe more common for servers not equipped with accelerators? I think LC is not standard yet. It's not total alien tech in Datacenter, but not commonplace either.

It is more common for cooling the facility itself.
The facilities I was responsible for had 22C as "inlet air".
That air was air-cooled, we had just begun looking into using LC for that, but not in hosts themselves...I have seen a water-damaged racks....expensive.

LC cooling in hosts are still a novelty.

I have no clue what the facilities in my new job uses...but I doubt it is LC.

And for HPC we are +1000 Watt per U...so cooling really matters...just LC comes with it own problems.
 
Cray-1 - Wikipedia

Key part of the design was the integrated liquid cooling :)

Cray-1-p1010221.jpg


The history of liquid cooled computers is long, Further back than this, but the Cray 1 is an icon.

You could say it's the last resort to use liquid cooling in a data centre, which is why it seems "abnormal". But there have been many periods over the past 60 years where it was crucial in getting something working. These were commercial systems, too, not just built for government laboratories and universities.

I think it's fair to say "accelerator cards" make it much more difficult to implement liquid cooling. Immersion cooling has been a thing for decades, too, which is a nice work-around for the physical difficulties associated with plumbing.

Google Brings Liquid Cooling to Data Centers to Cool Latest AI Chips (datacenterknowledge.com)

When competitors have been doing liquid cooling for a while, you gotta catch up, I suppose.

The LC was because of their "design".
ECL hot CPU's...kinda no other option...if you did not wanted to sacrifice performance.
 
Enter 500W SXM4 :)

Well...not really...on ECL circuts..there is a reason we do not design like that anymore ;)
The biggest problem (going forward) is not cooling.
It getting enough power delivery to your sites.
CPU are using more and more power, RAM are using more and more power, SSD's are using more and more power...and now GPU's are invading hosts.

Before:
1 rack, 16 hosts, 2 Leaf LAN Switches and 1 OOB switch ~= 8.000 Watt.

Future:
1 rack, 16 hosts(with GPU's), 2 Leaf LAN Switches and 1 OOB switch ~= 40.000 Watt

Most location were not designed with that in mind.

(Increase Watt = Increase cooling + increase airflow).

So those ~=40.000 Watt grows to ~=85.000 watt when cooling etc. are incorporated.

HPC will separate the field hard.
 
We have a thread talking about NVidia COPA:

https://forum.beyond3d.com/threads/nvidia-copa-composable-on-package-architecture.62452/

"A large COPA-GPU enabled L3 reduces the total DRAM-related per-GPU energy consumption by up to 3.4×, as shown in Section III-C. However, the improved DL-optimized COPA-GPU utilization may lead to increased total design power that may not be entirely mitigated by the power reduction within the memory system. To mitigate growing thermal density, we expect future high-end GPU systems will rely on liquid cooling technologies to enable increased thermal envelopes compared to those possible today."
 
the improved DL-optimized COPA-GPU utilization may lead to increased total design power that may not be entirely mitigated by the power reduction within the memory system
Ah, the ever present but always forgotten silicon horror! If every chip was used to a full and complete 100% of it's potential then we say goodbye to all of our advancements in clockspeeds, the chips will be so out of whack power wise that any hope of maintaining even base clocks is lost.
 
Blender 2.93 Rendering & Viewport Performance: Best CPUs & GPUs – Techgage
July 7, 2021
As we usually do, we’re diving into Blender 2.93’s performance in this article, taking a look at both rendering (to the CPU and GPU), and viewport performance (on the next page). We recently did a round of testing in Blender and other renderers for a recent article, but since that was published, we decided to redo all of our 2.93 testing on an AMD Ryzen 9 5950X platform to ensure that our CPU isn’t going to be a bottleneck for the viewport tests.
Blender-2.93-Eevee-GPU-Render-Performance-Mr.-Elephant--680x383.jpg

The EEVEE render engine doesn’t yet take advantage of OptiX accelerated ray tracing, so it gives us another apples-to-apples look at AMD vs. NVIDIA.
 
The most important part of the article is below:
On the topic of AMD, the de facto annoyance we’ve had when testing Blender the past year is seeing Radeon cards fall notably behind NVIDIA, even when ray tracing acceleration isn’t involved. The viewport results above can even highlight greater deltas between the two vendors than some of the renders do. Then there’s the issue of driver stability.

It was last fall, with our look at Blender 2.90, that some notable issues became more common in Blender with Radeon GPUs, and we’re not really sure that much has improved since then. In advance of any performance testing for a new article, we always ensure we have up-to-date drivers, but for Radeon, we couldn’t go that route this time. The recent 21.6.1 driver gives us errors in select renders, while the 21.6.2 driver has converted that error into a blue-screen-of-death.

There’s not really much more we can say here. We thoroughly test Blender, having done deep-dives for 2.80, 2.81, 2.83, 2.90, 2.91, 2.92, and of course, 2.93, and from our experience, you’ll have an easier and better time with NVIDIA if Blender is your primary tool of choice
 
Back
Top