AMD Execution Thread [2024]

raytracingfan · Jul 11, 2024

Frenetic Pony said:
Sony Santa Monica shipped GoW Ragnarok with neural net compressed, or rather uprezzed, normal maps. So I'm not sure it's that far away, as NTBC is a different branch but somewhat similar in concept.

The practicalness of using neural nets to upscale textures seems the most promising here. It costs little in terms of runtime, would only cost runtime for those that can afford it most (anyone wanting higher settings anyway), helps solve the fundamental problem of game sizes getting too big (virtualized textures solve texture use in RAM, so who cares about compressing ram size), and doesn't require any dedicated silicon like Nvidia's solution would.

Considering Ubisoft has been toying with similar concepts, and there's other research into texture magnification with upscalers anyway, that seems like the way to go.

I wonder if a future version of DLSS can bypass the neural texture upscaling or integrate it into the main upscaling step - the scene is rendered at low resolution with low resolution textures, and DLSS outputs an upscaled version that has the appearance of high resolution textures.

OlegSH · Jul 11, 2024

Pjotr said:
It works on all texture types

Is it?

This paper focuses on the BC1 and BC4 formats, which are the simplest and most widely used for RGB and single-channel textures, respectively.

Pjotr said:
It relies on all textures (diffuse, normal, roughness, etc.) used by the material to be compressed together to get the results in the paper as it relies on similarities in the textures for the compression.

It's no different from the NTBC in this regard.

Neural Texture Block Compression (NTBC) encodes multiple textures in a single material in BC formats with reduced storage size while maintaining reasonable quality.

Frenetic Pony said:
So I'm not sure it's that far away, as NTBC is a different branch but somewhat similar in concept.

They are all similar in the sense that they all use neural networks for compression. However, there are many different strategies for compressing data with NNs, each having different outcomes in terms of compression ratios, noise to signal ratios, and speed trade offs.

AMD adopted a real time approach where their NTBC inference should happen in tenths of milliseconds, which is ok for transcoding but not enough to decode textures on the fly and save VRAM. This approach requires small NNs that are overfitted for a specific dataset, such as a small pack of textures.

These small networks attempt to encode the difference between the reference and compressed image. I can easily imagine how a trained autoencoder CNN should be able to reconstruct in real time the reference image through learned convolutional kernels for a particular set of textures.

However, if you don't have the high res reference textures, upscaling by hallucinating new details would require a completely different approach, which typically involves using a GAN or a diffusion network with many more parameters. Such networks are not capable of inferring the details in real time or even within tenths of milliseconds, which might still be acceptable for transcoding in the background, but insufficient for reducing video memory footprints.

Frenetic Pony · Jul 11, 2024

OlegSH said:
AMD adopted a real time approach where their NTBC inference should happen in tenths of milliseconds, which is ok for transcoding but not enough to decode textures on the fly and save VRAM. This approach requires small NNs that are overfitted for a specific dataset, such as a small pack of textures.

These small networks attempt to encode the difference between the reference and compressed image. I can easily imagine how a trained autoencoder CNN should be able to reconstruct in real time the reference image through learned convolutional kernels for a particular set of textures.

Again, don't care about saving VRAM, on the high end it literally doesn't matter. Virtualized texturing and similar extreme performance streaming is the silver bullet there for consoles, who are using UE5 or etc. anyway so they already have fixed VRAM pools that can stream an arbitrary sized texture in seamlessly.

The hilarious thing about Nvidia trying to come up with some sort of proprietary tech they'll end up putting in for Blackwell and hailing as a "revolution" is that the silicon cost will be as much as just adding more GDDR, which is now a couple bucks per 8gb for GDDR6. They could just up the minimum size to 16gb for Blackwell and they'd probably save themselves money overall while delivering actual value to customers instead of a PR mirage of an SDK only a couple Nvidia funded indie games will ever use, but that's not the Nvidia way.

And while upscaling texture using a generic neural net would end up in more parameters, that's already part of neural net upscalers anyway. One could train DLSS 4 on textures as well, upscaling the initial disoccluded 0 history portions of frames and using the same neural net for textures.

The PSNR wouldn't be nearly as good as individually trained textures certainly, but it's still an interesting idea. Especially as textures often have similarity to each other and not just through the albedo/roughness/etc maps. A single neural net, concatenated from individually trained ones, used to upscale all the textures in a game should save even more disc space than individual ones, ideally.

Regardless, the primary goal would be to cut down on download/install size.

OlegSH · Jul 12, 2024

Frenetic Pony said:
And while upscaling texture using a generic neural net would end up in more parameters, that's already part of neural net upscalers anyway. One could train DLSS 4 on textures as well, upscaling the initial disoccluded 0 history portions of frames and using the same neural net for textures.

Video memory speculations aside, you're mixing many concepts here, which are not necessarily compatible and serve different purposes. Generic neural networks for image generation and real time spatial and temporal upscalers are all totally different beasts. Even assuming you have enough performance for running multiple of them and still getting some performance out of this, mixing them is not as easy task as you might imagine, as temporal accumulation would contradict the hallucination of missing details in the 0 history portions of a frame. Also, you don't want to have some details popping in the disoccluded areas only to vanish a frame later, as this would be perceived as a graphics artifacts (ghosting/etc), highlighting the dissocluded areas, rather than an improvement.

Newguy · Jul 17, 2024

Zen 5 die sizes, CCD is 70.6mm2 Z5 vs 71mm2 Z4 with a modest transistor increase - 8.3B vs 6.5B, N4P vs 5nm. Strix Point is considerably bigger than Phoenix/Hawk point - 232.5mm2 vs 178mm2, 1.3x, similar node (both 4nm) so expected result with everything added and improved

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße - Hardwareluxx

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße.

www.hardwareluxx.de

Erinyes · Jul 17, 2024

Newguy said:
Zen 5 die sizes, CCD is 70.6mm2 Z5 vs 71mm2 Z4 with a modest transistor increase - 8.3B vs 6.5B, N4P vs 5nm. Strix Point is considerably bigger than Phoenix/Hawk point - 232.5mm2 vs 178mm2, 1.3x, similar node (both 4nm) so expected result with everything added and improved

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße - Hardwareluxx

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße.

www.hardwareluxx.de

Impressive that they managed to keep the CCD the same size despite making the core significantly wider , though the cores were always much lower in size compared to the L3 cache in particular. The node density improvement is marginal (~6% I think), they might have used higher density libraries for some transistors.

Strix Point die size increase is not surprising given that they added 4 cores + L3, 4 CUs (and presumably larger L2) and a bigger NPU, and there was no node improvement. Kraken may end up being similar in size to Hawk Point and be a more economical option until Sonoma Valley, which I expect will be closer to 100mm2.

Seanspeed · Jul 17, 2024

I think they've actually managed a pretty substantial SRAM density improvement.

Compare Zen 4's core vs L3:

with Zen 5:

And I know neither of these are literally real die shots, the Zen 4 image absolutely correlates with the real thing in terms of dimensions of features:

Given that we know how dense the Zen 4 64MB Vcache chiplet is(which slightly overruns the 32MB base L3 in size), it probably isn't surprising there was good room for improvement here.

T2098 · Jul 17, 2024

Seanspeed said:
I think they've actually managed a pretty substantial SRAM density improvement.

Compare Zen 4's core vs L3:

View attachment 11635

with Zen 5:

View attachment 11636

And I know neither of these are literally real die shots, the Zen 4 image absolutely correlates with the real thing in terms of dimensions of features:

View attachment 11637

Given that we know how dense the Zen 4 64MB Vcache chiplet is(which slightly overruns the 32MB base L3 in size), it probably isn't surprising there was good room for improvement here.

They probably learned a bunch from their Zen4c optimizations, as a large chunk of the density increase in the 'compact' cores came from cache changes:

"It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4."

Seanspeed · Jul 17, 2024

T2098 said:
They probably learned a bunch from their Zen4c optimizations, as a large chunk of the density increase in the 'compact' cores came from cache changes:

"It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4."

Would make sense.

Really makes me wonder about the Zen 5 Vcache parts, though. I cant imagine there's much more room for improvement for Vcache density, and because the current L3 is relatively smaller(and cores bigger), I imagine any Vcache chip is going to cover the actual CPU cores quite a bit more this time around. Which isn't gonna be great for thermals....

Granath · Jul 18, 2024

Seanspeed said:
Would make sense.

Really makes me wonder about the Zen 5 Vcache parts, though. I cant imagine there's much more room for improvement for Vcache density, and because the current L3 is relatively smaller(and cores bigger), I imagine any Vcache chip is going to cover the actual CPU cores quite a bit more this time around. Which isn't gonna be great for thermals....

Or they can do 2 layers cache.

Kaotik · Jul 18, 2024

Didn't they say at Computex that Ryzen 9000 X3D would have something actually new stuff going on, not just another gen of same cache?

Erinyes · Jul 18, 2024

Kaotik said:
Didn't they say at Computex that Ryzen 9000 X3D would have something actually new stuff going on, not just another gen of same cache?

Not sure if AMD mentioned it but there have been rumours that there will be some new features, including overclocking. Given that it's their 3rd gen Vcache implementation, I'm sure they've made some improvements. Though supposedly larger changes in IOD design, IF design, Vcache/SoC Stacking will come with the Zen 6 family.

Frenetic Pony · Jul 18, 2024

OlegSH said:
Video memory speculations aside, you're mixing many concepts here, which are not necessarily compatible and serve different purposes. Generic neural networks for image generation and real time spatial and temporal upscalers are all totally different beasts. Even assuming you have enough performance for running multiple of them and still getting some performance out of this, mixing them is not as easy task as you might imagine, as temporal accumulation would contradict the hallucination of missing details in the 0 history portions of a frame. Also, you don't want to have some details popping in the disoccluded areas only to vanish a frame later, as this would be perceived as a graphics artifacts (ghosting/etc), highlighting the dissocluded areas, rather than an improvement.

Oh I wasn't talking about temporal accumulation, just that initial frame with newly disoccluded information has a rendering resolution lower than native, which even with screenspace AA is going to be noticeable, so you want to blow it up to output resolution.

Obviously AMD currently does this with their fancy Lanczos, but single image AI upscaling is a more promising long term route to image quality in this specific instance, and has no temporal accumulation as there's no history as of yet.

If you turn your neural net from a per material decompressor to a per game decompressor, the parameters and technique should start to see a lot of overlap between decompressing all these materials with one neural net, and a neural net that uprezzes a single input image, as both would be trained from increasingly similar data sets towards increasingly similar purposes.

It's a stretch of the original concept, but I can see all this going in that direction eventually, though I'm not sure if turning both the no history uprez net and decompression net into a single one would have much if any benefit it's still worthy of consideration.

DegustatoR · Jul 24, 2024

Zen 5 delayed by 2 weeks to 15th Aug

Some additional info disclosed:

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Since its reveal last week, we got a slightly more technical deep-dive from AMD on its two upcoming processors—the "Strix Point" silicon powering its Ryzen AI 300 series mobile processors; and the "Granite Ridge" chiplet MCM powering its Ryzen 9000 desktop processors. We present a closer look...

www.techpowerup.com

AMD Strix Point SoC "Zen 5" and "Zen 5c" CPU Cores Have 256-bit FPU Datapaths

AMD in its architecture deep-dive Q&A session with the press, confirmed that the "Zen 5" and "Zen 5c" cores on the "Strix Point" silicon only feature 256-bit wide FPU data-paths, unlike the "Zen 5" cores in the "Granite Ridge" Ryzen 9000 desktop processors. "The Zen 5c used in Strix has a...

www.techpowerup.com

AMD Details the Radeon 890M RDNA 3.5 iGPU of "Strix Point" a bit More

AMD presented a closer look at the Radeon 890M iGPU powering the Ryzen AI 300 series "Strix Point" mobile processor. The iGPU introduces the new RDNA 3.5 graphics architecture, with several architecture-level improvements built around the existing RDNA 3 SIMD, to yield performance/Watt...

www.techpowerup.com

Ryzen 9000 Chip Layout: New Details Announced

AMD "Granite Ridge" is codename for the four new Ryzen 9000 series desktop processors the company plans to launch on July 31, 2024. The processor is built in the Socket AM5 package, and is meant to be backwards compatible with AMD 600-series chipset motherboards, besides the new 800-series...

www.techpowerup.com

Seanspeed · Jul 24, 2024

I know there's a lot of things that could cause a delay, but they were already sending out tons of review samples. I wonder if AMD is looking at the generally lackluster performance improvements and second guessing their decision to go with conservative TDP/clocks for sake of making Zen 5 look extra efficient. Zen 5 seems like it's gonna be especially underwhelming for gaming, and a clockspeed boost could maybe help mitigate this a bit in reviews.

Would be quite a drastic last second change, but should be possible. And given they haven't released prices or pre-orders or anything, it wouldn't be completely too late for it.

DegustatoR · Jul 24, 2024

Steve suggests that they've decided to wait till Intel will push the m/c update as that one may affect Intel's performance (although I think this is somewhat unlikely and is basically a conspiracy theory).

dorf · Jul 24, 2024

Sounds like quality concerns:

We appreciate the excitement around Ryzen 9000 series processors. During final checks, we found the initial production units that were shipped to our channel partners did not meet our full quality expectations. Out of an abundance of caution and to maintain the highest quality experiences for every Ryzen user, we are working with our channel partners to replace the initial production units with fresh units.

As a result, there will be a short delay in retail availability. The Ryzen 7 9700X and Ryzen 5 9600X processors will now go on sale on August 8th and the Ryzen 9 9950X and Ryzen 9 9900X processors will go on sale on August 15th. We pride ourselves in providing a high-quality experience for every Ryzen user, and we look forward to our fans having a great experience with the new Ryzen 9000 series.

https://twitter.com/x/status/1816206872090660864

orangpelupa · Jul 25, 2024

It's precall time!

Btw curios how will they check for the precalled batches for the issues they are concerned with.

So we as end user also able to check for issues after the precall. In case some were missed.

Kaotik · Jul 25, 2024

DegustatoR said:
Steve suggests that they've decided to wait till Intel will push the m/c update as that one may affect Intel's performance (although I think this is somewhat unlikely and is basically a conspiracy theory).

Timelines don't match. Intel has said around mid August, all Ryzens will be on sale by 15th. If they'd wait for Intel they'd launch Ryzens towards end of August at the earliest.

Erinyes · Jul 25, 2024

Seanspeed said:
I know there's a lot of things that could cause a delay, but they were already sending out tons of review samples. I wonder if AMD is looking at the generally lackluster performance improvements and second guessing their decision to go with conservative TDP/clocks for sake of making Zen 5 look extra efficient. Zen 5 seems like it's gonna be especially underwhelming for gaming, and a clockspeed boost could maybe help mitigate this a bit in reviews.

Would be quite a drastic last second change, but should be possible. And given they haven't released prices or pre-orders or anything, it wouldn't be completely too late for it.

Actually they hadn't sent out any review samples yet from what I've read. Many reviewers were also mentioning this recently as they'd have been hard pressed to complete testing before the earlier launch date of 31st July. Don't think its anything to do with spec changes, though there were rumours earlier that the 9700X may get an official TDP boost. Not that it matters as it's unlocked anyway.

dorf said:
Sounds like quality concerns:

https://twitter.com/x/status/1816206872090660864

Additional info from the verge - "This is not because AMD’s found any issues with the actual chips, spokesperson Stacy MacDiarmid tells The Verge. Rather, AMD discovered some of its chips didn’t go through all of the proper testing procedures, and the company wants to make sure they do"

orangpelupa said:
It's precall time!

Btw curios how will they check for the precalled batches for the issues they are concerned with.

So we as end user also able to check for issues after the precall. In case some were missed.

The CPUs will go through QA again and only the CPUs that pass will be sold so shouldn't be an issue for consumers in any way.

Kaotik said:
Timelines don't match. Intel has said around mid August, all Ryzens will be on sale by 15th. If they'd wait for Intel they'd launch Ryzens towards end of August at the earliest.

Yea nothing to do with Intel. Overall this is a good move by AMD to ensure users do not face any issues.

AMD Execution Thread [2024]

raytracingfan

OlegSH

Frenetic Pony

OlegSH

Newguy

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße - Hardwareluxx

Erinyes

Granite Ridge und Strix Point: AMD macht Angaben zur Chipgröße - Hardwareluxx

Seanspeed

T2098

Seanspeed

Granath

Kaotik

Drunk Member

Erinyes

Frenetic Pony

DegustatoR

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

AMD Strix Point SoC "Zen 5" and "Zen 5c" CPU Cores Have 256-bit FPU Datapaths

AMD Details the Radeon 890M RDNA 3.5 iGPU of "Strix Point" a bit More

Ryzen 9000 Chip Layout: New Details Announced

Seanspeed

DegustatoR

dorf

orangpelupa

Elite Bug Hunter

Kaotik

Drunk Member

Erinyes

Similar threads