Funny enough there are a lot of users that buy 4090s to play Call of Duty at 1080p on low settings, or Counter-Strike, Fortnite etc. There's a pay to win market for the highest end cards, and ROPs might help just push more frames at low settings. The 4090 weirdly fits two groups: people who just want max ray-tracing at 4k and super low framerates, but also esports players.
It's the same slide that's been making the news lately. I don't think that it shows what news makers are saying it shows. There is no apparent change in Nv's roadmap in this slide, they've been doing a launch/refresh 2-year cadence for some time now in AI/HPC space, and GB100 "launch" was supposed to happen in early 24 anyway - likely with early availability in 2Q and general one by the end of the year - exactly how it was with all Nvidia's server offering for the last decade or so.NVIDIA Blackwell B100 GPUs To Feature SK Hynix HBM3e Memory, Launches In Q2 2024 Due To Rise In AI Demand
NVIDIA Blackwell B100 GPU is reportedly launching early in Q2 2024 and will feature SK Hynix's brand new HBM3e memory solution for AI chips.wccftech.com
“We comply with all applicable regulations while working to provide products that support thousands of applications across many different industries,” an Nvidia spokesperson told CNBC. “Given the demand worldwide for our products, we don’t expect a near-term meaningful impact on our financial results.”
-NVIDIA Rep to CNBC
Dynamic Frame Generation leverages stochastic control to automatically trigger DLSS-G. This adaptive monitoring mechanism activates frame generation only when it boosts performance beyond the native framerate production of the game. Otherwise, DLSS-G remains disabled to ensure optimal framerate performance.
Nvidia states no meaningful change to financials ... too much demand outside China.US Government Restricts Exports of NVIDIA's China-Exclusive H800 & A800 AI GPUs To China
The US Government has blocked all imports of AI hardware including NVIDIA's China-Exclusive H800 & A800 GPUs to China.wccftech.com
Besides, it motivates China to build their own tech faster. So for a small short timed win, we only accelerate the downfall of the west in the long run.The whole thing is silly anyway. Computers and networks are inherently scalable. It’ll be more expensive but there’s nothing stopping China from hooking up tons of smaller chips or smuggling top of the line stuff from other countries.
China was going to do it anyway. Remeber Made in China ?Besides, it motivates China to build their own tech faster. So for a small short timed win, we only accelerate the downfall of the west in the long run.
Be prepared for the cheap APU future i propose for so long, because soon this will be the only option we poor guys can still afford. : /
The bigger implications are on the software stack side. Developers located in countries who are not on the best terms with those that willingly perpetuate the rules-based international order will be forced to abandon the likes of CUDA. China's not likely to use other countries for smuggling especially for products that require authentication to use, they're more likely to kickstart the tech industry of less developed economies with their own hardware ...Best case is no short term impact but China is too big a market to ignore. Nvidia’s long term prospects will take a major hit unless they figure this out. The whole thing is silly anyway. Computers and networks are inherently scalable. It’ll be more expensive but there’s nothing stopping China from hooking up tons of smaller chips or smuggling top of the line stuff from other countries.
The bigger implications are on the software stack side. Developers located in countries who are not on the best terms with those that willingly perpetuate the rules-based international order will be forced to abandon the likes of CUDA. China's not likely to use other countries for smuggling especially for products that require authentication to use, they're more likely to kickstart the tech industry of less developed economies with their own hardware ...
@Bold I'd argue it's the opposite. Producing hardware is only relatively easy in a vacuum if you ignore the complexities of photolithography and integrated digital logic manufacturing. Software can become easier to replace if the circumstances surrounding it are politically untenable ...It's not that easy though. They wanted to do that I'm sure, but apart from some small niche markets they seem to still be copying software from the west.
From this perspective, hardware is easy, software (especially an eco system) is very hard.
@Bold I'd argue it's the opposite. Producing hardware is only relatively easy in a vacuum if you ignore the complexities of photolithography and integrated digital logic manufacturing. Software can become easier to replace if the circumstances surrounding it are politically untenable ...
If we take Apple as an example who is constantly lauded for their top hardware designs, you'll notice just about everyone else takes turns dunking on their subpar software design since their customers are fine with software obsolescence on a wide scale. In quite a few cases it's the opposite with their competitors. A captive market due to political conditions will overlook the secondary characteristics such as software support if the other alternative is their only option. If I'm an unsavoury regime looking to shop around, I'm not going to only look at quality software especially from a hostile merchant which has a vested interest in making their product "obsolete" to customers they don't like. With hardware sales, at least the customer can decide whenever their purchase is obsolete. Hardware obsolescence is a much bigger threat when we factor the much higher overall barrier of entry over there ...
If the goal for both is "domestic consumption" then China has had a far better track record of replicating software functionality than they do at hardware functionality. You should take a look at their bizzare world of WeChat, Weibo, Blibli, AliExpress, Tencent Games and even Bytedance's TikTok is a major hit across the world to the point where Google copied them! The only holdouts left that China hasn't been able to conquer is operating systems, productivity, frameworks and other types of low level infrastructure in terms of software. On the hardware side, they've seen many more flops to the point where most of their domestic consumers still uses foreign hardware components to this day ...I understand your point and I think this is a common misconception. I'm a software person myself and I also used to believe it's easier to do software and much harder to do hardware. However, the history is full of counter examples.
Even the "Red Star OS" from North Korea, which is solely for domestic use and they have no intention to export any of it, is based on Linux and KDE, and its browser is based on Mozilla Firefox.
Obviously, under severe pressure they might have to develop their own software, but even so it'll probably look very similar to CUDA, so they can integrate all existing software into it without too many modifications.
Nvidia will soon release TensorRT-LLM, a new open-source library designed to accelerate generative AI algorithms on GeForce RTX and professional RTX GPUs. The latest graphics chips from the Santa Clara corporation include dedicated AI processors called Tensor Cores, which are now providing native AI hardware acceleration to more than 100 million Windows PCs and workstations.
On an RTX-equipped system, TensorRT-LLM can seemingly deliver up to 4x faster inference performance for the latest and most advanced AI large language models (LLM) like Llama 2 and Code Llama. While TensorRT was initially released for data center applications, it is now available for Windows PCs equipped with powerful RTX graphics chips.
...
While TensorRT is primarily designed for generative AI professionals and developers, Nvidia is also working on additional AI-based improvements for traditional GeForce RTX customers. TensorRT can now accelerate high-quality image generation using Stable Diffusion, thanks to features like layer fusion, precision calibration, and kernel auto-tuning.
“[In TensorRT-LLM, we] made sure we have the best possible tensor core optimizations for large language models,” Buck said. “This allows people to take any large language model and pass it through TensorRT-LLM to get the benefit of Hopper’s transformer engine, which enables the FP8 compute capabilities of Hopper…but without any loss of accuracy in the production workflow.”
...
“You can easily just take a 32- or 16-bit calculation and cram it into an FPGA, but chances are you’re going to get the wrong answer, because it won’t have the production level accuracy you want,” Buck said. “Doing that thoughtfully and carefully, maintaining scale and bias to keep the calculations in the range of only 8 bits in some cases—keeping FP16 for some parts of the model—this is something Nvidia has been working on for some time.”
TensorRT-LLM also includes a new feature called in-flight batching.
“Our standard batching approaches would always wait for the longest query to complete,” he said. “Image queries all roughly took the same time—that wasn’t a problem from an efficiency standpoint, and queries could be padded out, so it wasn’t a big deal.”
With the new in-flight batching feature, once queries complete, they can retire and the software can insert another query—all while a longer query is still in flight. This helps improve GPU utilization for LLMs with diverse query lengths.
“Frankly, the result surprised even me,” Buck said. “It doubled the performance of Hopper. Hopper is such a powerful GPU, it can handle lots of queries in the same GPU in parallel, but without the in-flight batching, if you gave it diverse queries, it would wait for the longest one and not be fully utilized.”
TensorRT-LLM is open source, along with all of Nvidia’s LLM work, including many LLM models, such as GPT, Bloom and Falcon that’ve been optimized with techniques like kernel fusion, faster attention, multi-headed attention, etc. Kernels for all these operations have been open sourced as part of TensorRT-LLM.
...
The performance boost from TensorRT-LLM should be obvious in the next round of MLPerf inference scores, Buck added, which are due next spring.