Nvidia shows signs in [2023]

^M^ · Oct 13, 2023

Some people just want to see the World burn

trinibwoy · Oct 15, 2023

Scott_Arm said:
Funny enough there are a lot of users that buy 4090s to play Call of Duty at 1080p on low settings, or Counter-Strike, Fortnite etc. There's a pay to win market for the highest end cards, and ROPs might help just push more frames at low settings. The 4090 weirdly fits two groups: people who just want max ray-tracing at 4k and super low framerates, but also esports players.

I take it back. Looking at Jusant and Talos demos UE5 still does a fair amount of pixel shader work. I thought pixel shaders would be used just for writing the visibility buffer with all shading done via compute but that doesn't seem to be the case. So if you want good UE5 performance you need to make sure ROPs aren't a major bottleneck given Nanite geometry density and potentially high overdraw. Just a guess but it may be sufficient motivation to beef up ROPs in the next generation of hardware.

pharma · Oct 15, 2023

NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand

NVIDIA Blackwell B100 GPU is reportedly launching early in Q2 2024 and will feature SK Hynix's brand new HBM3e memory solution for AI chips.

wccftech.com

DegustatoR · Oct 15, 2023

pharma said:
NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand

NVIDIA Blackwell B100 GPU is reportedly launching early in Q2 2024 and will feature SK Hynix's brand new HBM3e memory solution for AI chips.

wccftech.com

It's the same slide that's been making the news lately. I don't think that it shows what news makers are saying it shows. There is no apparent change in Nv's roadmap in this slide, they've been doing a launch/refresh 2-year cadence for some time now in AI/HPC space, and GB100 "launch" was supposed to happen in early 24 anyway - likely with early availability in 2Q and general one by the end of the year - exactly how it was with all Nvidia's server offering for the last decade or so.

pharma · Oct 15, 2023

pharma · Oct 16, 2023

October 16, 2023

Interview with Jensen Huang

pharma · Oct 17, 2023

US Government Restricts Exports of NVIDIA's China-Exclusive H800 & A800 AI GPUs To China

The US Government has blocked all imports of AI hardware including NVIDIA's China-Exclusive H800 & A800 GPUs to China.

wccftech.com

Nvidia states no meaningful change to financials ... too much demand outside China.

“We comply with all applicable regulations while working to provide products that support thousands of applications across many different industries,” an Nvidia spokesperson told CNBC. “Given the demand worldwide for our products, we don’t expect a near-term meaningful impact on our financial results.”

-NVIDIA Rep to CNBC

dorf · Oct 17, 2023

Streamline 2.2.1 update includes Dynamic Frame Generation:

Dynamic Frame Generation leverages stochastic control to automatically trigger DLSS-G. This adaptive monitoring mechanism activates frame generation only when it boosts performance beyond the native framerate production of the game. Otherwise, DLSS-G remains disabled to ensure optimal framerate performance.

Source:

GitHub - NVIDIAGameWorks/Streamline: Streamline Integration Framework

Streamline Integration Framework. Contribute to NVIDIAGameWorks/Streamline development by creating an account on GitHub.

github.com

trinibwoy · Oct 18, 2023

pharma said:
US Government Restricts Exports of NVIDIA's China-Exclusive H800 & A800 AI GPUs To China

The US Government has blocked all imports of AI hardware including NVIDIA's China-Exclusive H800 & A800 GPUs to China.

wccftech.com

Nvidia states no meaningful change to financials ... too much demand outside China.

Best case is no short term impact but China is too big a market to ignore. Nvidia’s long term prospects will take a major hit unless they figure this out. The whole thing is silly anyway. Computers and networks are inherently scalable. It’ll be more expensive but there’s nothing stopping China from hooking up tons of smaller chips or smuggling top of the line stuff from other countries.

JoeJ · Oct 18, 2023

trinibwoy said:
The whole thing is silly anyway. Computers and networks are inherently scalable. It’ll be more expensive but there’s nothing stopping China from hooking up tons of smaller chips or smuggling top of the line stuff from other countries.

Besides, it motivates China to build their own tech faster. So for a small short timed win, we only accelerate the downfall of the west in the long run.

Be prepared for the cheap APU future i propose for so long, because soon this will be the only option we poor guys can still afford. : /

Granath · Oct 18, 2023

JoeJ said:
Besides, it motivates China to build their own tech faster. So for a small short timed win, we only accelerate the downfall of the west in the long run.

Be prepared for the cheap APU future i propose for so long, because soon this will be the only option we poor guys can still afford. : /

China was going to do it anyway. Remeber Made in China ?

Arun · Oct 18, 2023

Biren seems far ahead of everyone else in China for AI chips (excluding perhaps some of the in-house Alibaba stuff etc.) and they are now on the Entity List so presumably can no longer use TSMC at all; i.e. they're in a similar situation to Huawei aka HiSilicon. It looked like they already couldn't with the sanctions a year ago but then it got resolved with the same silly loopholes as NVIDIA used for A800/H800. The new sanctions are pretty much just the old sanctions without all the loopholes; because of these loopholes, I don't think the previous ones achieved anything except slightly increase NVIDIA's profits (as Chinese customers would have to buy slightly more chips for the same level of performance).

China looks forced to become more and more dependent on SMIC but thankfully for them, the sanctions against SMIC via ASML are nearly as much of a joke as the AI sanctions were a year ago. It's unbelievably incompetent that they'd sanction EUV but not the Twiscan 2000 DUV immersion tools from ASML until *January 2024* (?! either do it or don't!) since it should be very possible to go down to 5nm on the latter without EUV (probably at higher cost than EUV but not as bad as the old DUV tools were). I think it's nearly a given at this point that SMIC will be mass producing 5nm within ~2 years.

Lurkmass · Oct 18, 2023

trinibwoy said:
Best case is no short term impact but China is too big a market to ignore. Nvidia’s long term prospects will take a major hit unless they figure this out. The whole thing is silly anyway. Computers and networks are inherently scalable. It’ll be more expensive but there’s nothing stopping China from hooking up tons of smaller chips or smuggling top of the line stuff from other countries.

The bigger implications are on the software stack side. Developers located in countries who are not on the best terms with those that willingly perpetuate the rules-based international order will be forced to abandon the likes of CUDA. China's not likely to use other countries for smuggling especially for products that require authentication to use, they're more likely to kickstart the tech industry of less developed economies with their own hardware ...

pcchen · Oct 19, 2023

Lurkmass said:
The bigger implications are on the software stack side. Developers located in countries who are not on the best terms with those that willingly perpetuate the rules-based international order will be forced to abandon the likes of CUDA. China's not likely to use other countries for smuggling especially for products that require authentication to use, they're more likely to kickstart the tech industry of less developed economies with their own hardware ...

It's not that easy though. They wanted to do that I'm sure, but apart from some small niche markets they seem to still be copying software from the west.
From this perspective, hardware is easy, software (especially an eco system) is very hard.

Lurkmass · Oct 19, 2023

pcchen said:
It's not that easy though. They wanted to do that I'm sure, but apart from some small niche markets they seem to still be copying software from the west.
From this perspective, hardware is easy, software (especially an eco system) is very hard.

@Bold I'd argue it's the opposite. Producing hardware is only relatively easy in a vacuum if you ignore the complexities of photolithography and integrated digital logic manufacturing. Software can become easier to replace if the circumstances surrounding it are politically untenable ...

If we take Apple as an example who is constantly lauded for their top hardware designs, you'll notice just about everyone else takes turns dunking on their subpar software design since their customers are fine with software obsolescence on a wide scale. In quite a few cases it's the opposite with their competitors. A captive market due to political conditions will overlook the secondary characteristics such as software support if the other alternative is their only option. If I'm an unsavoury regime looking to shop around, I'm not going to only look at quality software especially from a hostile merchant which has a vested interest in making their product "obsolete" to customers they don't like. With hardware sales, at least the customer can decide whenever their purchase is obsolete. Hardware obsolescence is a much bigger threat when we factor the much higher overall barrier of entry over there ...

pcchen · Oct 19, 2023

Lurkmass said:
@Bold I'd argue it's the opposite. Producing hardware is only relatively easy in a vacuum if you ignore the complexities of photolithography and integrated digital logic manufacturing. Software can become easier to replace if the circumstances surrounding it are politically untenable ...

If we take Apple as an example who is constantly lauded for their top hardware designs, you'll notice just about everyone else takes turns dunking on their subpar software design since their customers are fine with software obsolescence on a wide scale. In quite a few cases it's the opposite with their competitors. A captive market due to political conditions will overlook the secondary characteristics such as software support if the other alternative is their only option. If I'm an unsavoury regime looking to shop around, I'm not going to only look at quality software especially from a hostile merchant which has a vested interest in making their product "obsolete" to customers they don't like. With hardware sales, at least the customer can decide whenever their purchase is obsolete. Hardware obsolescence is a much bigger threat when we factor the much higher overall barrier of entry over there ...

I understand your point and I think this is a common misconception. I'm a software person myself and I also used to believe it's easier to do software and much harder to do hardware. However, the history is full of counter examples.
Even the "Red Star OS" from North Korea, which is solely for domestic use and they have no intention to export any of it, is based on Linux and KDE, and its browser is based on Mozilla Firefox.
Obviously, under severe pressure they might have to develop their own software, but even so it'll probably look very similar to CUDA, so they can integrate all existing software into it without too many modifications.

Lurkmass · Oct 19, 2023

pcchen said:
I understand your point and I think this is a common misconception. I'm a software person myself and I also used to believe it's easier to do software and much harder to do hardware. However, the history is full of counter examples.
Even the "Red Star OS" from North Korea, which is solely for domestic use and they have no intention to export any of it, is based on Linux and KDE, and its browser is based on Mozilla Firefox.
Obviously, under severe pressure they might have to develop their own software, but even so it'll probably look very similar to CUDA, so they can integrate all existing software into it without too many modifications.

If the goal for both is "domestic consumption" then China has had a far better track record of replicating software functionality than they do at hardware functionality. You should take a look at their bizzare world of WeChat, Weibo, Blibli, AliExpress, Tencent Games and even Bytedance's TikTok is a major hit across the world to the point where Google copied them! The only holdouts left that China hasn't been able to conquer is operating systems, productivity, frameworks and other types of low level infrastructure in terms of software. On the hardware side, they've seen many more flops to the point where most of their domestic consumers still uses foreign hardware components to this day ...

Rootax · Oct 19, 2023

I'm pretty sure the chips can find a way to China anyway...

del42sa · Oct 19, 2023

https://nitter.cz/kopite7kimi/status/1712637308342820874

after Blackwell next NV architecture name is Rubin

pharma · Oct 19, 2023

October 19, 2023

TensorRT-LLM for Windows speeds up generative AI performance on GeForce RTX GPUs

Nvidia will soon release TensorRT-LLM, a new open-source library designed to accelerate generative AI algorithms on GeForce RTX and professional RTX GPUs. The latest graphics chips from...

www.techspot.com

Nvidia will soon release TensorRT-LLM, a new open-source library designed to accelerate generative AI algorithms on GeForce RTX and professional RTX GPUs. The latest graphics chips from the Santa Clara corporation include dedicated AI processors called Tensor Cores, which are now providing native AI hardware acceleration to more than 100 million Windows PCs and workstations.

On an RTX-equipped system, TensorRT-LLM can seemingly deliver up to 4x faster inference performance for the latest and most advanced AI large language models (LLM) like Llama 2 and Code Llama. While TensorRT was initially released for data center applications, it is now available for Windows PCs equipped with powerful RTX graphics chips.
...
While TensorRT is primarily designed for generative AI professionals and developers, Nvidia is also working on additional AI-based improvements for traditional GeForce RTX customers. TensorRT can now accelerate high-quality image generation using Stable Diffusion, thanks to features like layer fusion, precision calibration, and kernel auto-tuning.

https://www.eetimes.com/nvidia-boosts-llm-inference-with-open-source-library/

“[In TensorRT-LLM, we] made sure we have the best possible tensor core optimizations for large language models,” Buck said. “This allows people to take any large language model and pass it through TensorRT-LLM to get the benefit of Hopper’s transformer engine, which enables the FP8 compute capabilities of Hopper…but without any loss of accuracy in the production workflow.”
...
“You can easily just take a 32- or 16-bit calculation and cram it into an FPGA, but chances are you’re going to get the wrong answer, because it won’t have the production level accuracy you want,” Buck said. “Doing that thoughtfully and carefully, maintaining scale and bias to keep the calculations in the range of only 8 bits in some cases—keeping FP16 for some parts of the model—this is something Nvidia has been working on for some time.”

TensorRT-LLM also includes a new feature called in-flight batching.

“Our standard batching approaches would always wait for the longest query to complete,” he said. “Image queries all roughly took the same time—that wasn’t a problem from an efficiency standpoint, and queries could be padded out, so it wasn’t a big deal.”

With the new in-flight batching feature, once queries complete, they can retire and the software can insert another query—all while a longer query is still in flight. This helps improve GPU utilization for LLMs with diverse query lengths.

“Frankly, the result surprised even me,” Buck said. “It doubled the performance of Hopper. Hopper is such a powerful GPU, it can handle lots of queries in the same GPU in parallel, but without the in-flight batching, if you gave it diverse queries, it would wait for the longest one and not be fully utilized.”

TensorRT-LLM is open source, along with all of Nvidia’s LLM work, including many LLM models, such as GPT, Bloom and Falcon that’ve been optimized with techniques like kernel fusion, faster attention, multi-headed attention, etc. Kernels for all these operations have been open sourced as part of TensorRT-LLM.
...
The performance boost from TensorRT-LLM should be obvious in the next round of MLPerf inference scores, Buck added, which are due next spring.

Nvidia shows signs in [2023]

^M^

trinibwoy

Meh

pharma

NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand

DegustatoR

NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand

pharma

pharma

pharma

US Government Restricts Exports of NVIDIA's China-Exclusive H800 & A800 AI GPUs To China

dorf

GitHub - NVIDIAGameWorks/Streamline: Streamline Integration Framework

trinibwoy

Meh

US Government Restricts Exports of NVIDIA's China-Exclusive H800 & A800 AI GPUs To China

JoeJ

Granath

Arun

Unknown.

Lurkmass

pcchen

Moderator

Lurkmass

pcchen

Moderator

Lurkmass

Rootax

del42sa

pharma

TensorRT-LLM for Windows speeds up generative AI performance on GeForce RTX GPUs

Similar threads