AMD Execution Thread [2024]

Sony Santa Monica shipped GoW Ragnarok with neural net compressed, or rather uprezzed, normal maps. So I'm not sure it's that far away, as NTBC is a different branch but somewhat similar in concept.

The practicalness of using neural nets to upscale textures seems the most promising here. It costs little in terms of runtime, would only cost runtime for those that can afford it most (anyone wanting higher settings anyway), helps solve the fundamental problem of game sizes getting too big (virtualized textures solve texture use in RAM, so who cares about compressing ram size), and doesn't require any dedicated silicon like Nvidia's solution would.

Considering Ubisoft has been toying with similar concepts, and there's other research into texture magnification with upscalers anyway, that seems like the way to go.
I wonder if a future version of DLSS can bypass the neural texture upscaling or integrate it into the main upscaling step - the scene is rendered at low resolution with low resolution textures, and DLSS outputs an upscaled version that has the appearance of high resolution textures.
 
It works on all texture types
Is it?

This paper focuses on the BC1 and BC4 formats, which are the simplest and most widely used for RGB and single-channel textures, respectively.

It relies on all textures (diffuse, normal, roughness, etc.) used by the material to be compressed together to get the results in the paper as it relies on similarities in the textures for the compression.
It's no different from the NTBC in this regard.

Neural Texture Block Compression (NTBC) encodes multiple textures in a single material in BC formats with reduced storage size while maintaining reasonable quality.

So I'm not sure it's that far away, as NTBC is a different branch but somewhat similar in concept.
They are all similar in the sense that they all use neural networks for compression. However, there are many different strategies for compressing data with NNs, each having different outcomes in terms of compression ratios, noise to signal ratios, and speed trade offs.

AMD adopted a real time approach where their NTBC inference should happen in tenths of milliseconds, which is ok for transcoding but not enough to decode textures on the fly and save VRAM. This approach requires small NNs that are overfitted for a specific dataset, such as a small pack of textures.

These small networks attempt to encode the difference between the reference and compressed image. I can easily imagine how a trained autoencoder CNN should be able to reconstruct in real time the reference image through learned convolutional kernels for a particular set of textures.

However, if you don't have the high res reference textures, upscaling by hallucinating new details would require a completely different approach, which typically involves using a GAN or a diffusion network with many more parameters. Such networks are not capable of inferring the details in real time or even within tenths of milliseconds, which might still be acceptable for transcoding in the background, but insufficient for reducing video memory footprints.
 
AMD adopted a real time approach where their NTBC inference should happen in tenths of milliseconds, which is ok for transcoding but not enough to decode textures on the fly and save VRAM. This approach requires small NNs that are overfitted for a specific dataset, such as a small pack of textures.

These small networks attempt to encode the difference between the reference and compressed image. I can easily imagine how a trained autoencoder CNN should be able to reconstruct in real time the reference image through learned convolutional kernels for a particular set of textures.

Again, don't care about saving VRAM, on the high end it literally doesn't matter. Virtualized texturing and similar extreme performance streaming is the silver bullet there for consoles, who are using UE5 or etc. anyway so they already have fixed VRAM pools that can stream an arbitrary sized texture in seamlessly.

The hilarious thing about Nvidia trying to come up with some sort of proprietary tech they'll end up putting in for Blackwell and hailing as a "revolution" is that the silicon cost will be as much as just adding more GDDR, which is now a couple bucks per 8gb for GDDR6. They could just up the minimum size to 16gb for Blackwell and they'd probably save themselves money overall while delivering actual value to customers instead of a PR mirage of an SDK only a couple Nvidia funded indie games will ever use, but that's not the Nvidia way.

And while upscaling texture using a generic neural net would end up in more parameters, that's already part of neural net upscalers anyway. One could train DLSS 4 on textures as well, upscaling the initial disoccluded 0 history portions of frames and using the same neural net for textures.

The PSNR wouldn't be nearly as good as individually trained textures certainly, but it's still an interesting idea. Especially as textures often have similarity to each other and not just through the albedo/roughness/etc maps. A single neural net, concatenated from individually trained ones, used to upscale all the textures in a game should save even more disc space than individual ones, ideally.

Regardless, the primary goal would be to cut down on download/install size.
 
Last edited:
And while upscaling texture using a generic neural net would end up in more parameters, that's already part of neural net upscalers anyway. One could train DLSS 4 on textures as well, upscaling the initial disoccluded 0 history portions of frames and using the same neural net for textures.
Video memory speculations aside, you're mixing many concepts here, which are not necessarily compatible and serve different purposes. Generic neural networks for image generation and real time spatial and temporal upscalers are all totally different beasts. Even assuming you have enough performance for running multiple of them and still getting some performance out of this, mixing them is not as easy task as you might imagine, as temporal accumulation would contradict the hallucination of missing details in the 0 history portions of a frame. Also, you don't want to have some details popping in the disoccluded areas only to vanish a frame later, as this would be perceived as a graphics artifacts (ghosting/etc), highlighting the dissocluded areas, rather than an improvement.
 
Last edited:
Zen 5 die sizes, CCD is 70.6mm2 Z5 vs 71mm2 Z4 with a modest transistor increase - 8.3B vs 6.5B, N4P vs 5nm. Strix Point is considerably bigger than Phoenix/Hawk point - 232.5mm2 vs 178mm2, 1.3x, similar node (both 4nm) so expected result with everything added and improved


Impressive that they managed to keep the CCD the same size despite making the core significantly wider , though the cores were always much lower in size compared to the L3 cache in particular. The node density improvement is marginal (~6% I think), they might have used higher density libraries for some transistors.

Strix Point die size increase is not surprising given that they added 4 cores + L3, 4 CUs (and presumably larger L2) and a bigger NPU, and there was no node improvement. Kraken may end up being similar in size to Hawk Point and be a more economical option until Sonoma Valley, which I expect will be closer to 100mm2.
 
I think they've actually managed a pretty substantial SRAM density improvement.

Compare Zen 4's core vs L3:

zen 4 dieshot.jpg

with Zen 5:

zen 5 dieshot.jpg

And I know neither of these are literally real die shots, the Zen 4 image absolutely correlates with the real thing in terms of dimensions of features:

zen 4 dieshot real.jpg

Given that we know how dense the Zen 4 64MB Vcache chiplet is(which slightly overruns the 32MB base L3 in size), it probably isn't surprising there was good room for improvement here.
 
I think they've actually managed a pretty substantial SRAM density improvement.

Compare Zen 4's core vs L3:

View attachment 11635

with Zen 5:

View attachment 11636

And I know neither of these are literally real die shots, the Zen 4 image absolutely correlates with the real thing in terms of dimensions of features:

View attachment 11637

Given that we know how dense the Zen 4 64MB Vcache chiplet is(which slightly overruns the 32MB base L3 in size), it probably isn't surprising there was good room for improvement here.

They probably learned a bunch from their Zen4c optimizations, as a large chunk of the density increase in the 'compact' cores came from cache changes:
  • "It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4."
 
They probably learned a bunch from their Zen4c optimizations, as a large chunk of the density increase in the 'compact' cores came from cache changes:
  • "It used denser 6T dual-port SRAM cells for Zen 4c as opposed to 8T dual-port SRAM circuits for Zen 4 to reduce SRAM area. As a result, while Zen 4 and Zen 4c cores have similar L1 and L2 cache sizes, the area used by caches in case of Zen 4c is lower, but these caches also are not as fast as those inside Zen 4."
Would make sense.

Really makes me wonder about the Zen 5 Vcache parts, though. I cant imagine there's much more room for improvement for Vcache density, and because the current L3 is relatively smaller(and cores bigger), I imagine any Vcache chip is going to cover the actual CPU cores quite a bit more this time around. Which isn't gonna be great for thermals....
 
Would make sense.

Really makes me wonder about the Zen 5 Vcache parts, though. I cant imagine there's much more room for improvement for Vcache density, and because the current L3 is relatively smaller(and cores bigger), I imagine any Vcache chip is going to cover the actual CPU cores quite a bit more this time around. Which isn't gonna be great for thermals....
Or they can do 2 layers cache.
 
Didn't they say at Computex that Ryzen 9000 X3D would have something actually new stuff going on, not just another gen of same cache?
 
Didn't they say at Computex that Ryzen 9000 X3D would have something actually new stuff going on, not just another gen of same cache?

Not sure if AMD mentioned it but there have been rumours that there will be some new features, including overclocking. Given that it's their 3rd gen Vcache implementation, I'm sure they've made some improvements. Though supposedly larger changes in IOD design, IF design, Vcache/SoC Stacking will come with the Zen 6 family.
 
Video memory speculations aside, you're mixing many concepts here, which are not necessarily compatible and serve different purposes. Generic neural networks for image generation and real time spatial and temporal upscalers are all totally different beasts. Even assuming you have enough performance for running multiple of them and still getting some performance out of this, mixing them is not as easy task as you might imagine, as temporal accumulation would contradict the hallucination of missing details in the 0 history portions of a frame. Also, you don't want to have some details popping in the disoccluded areas only to vanish a frame later, as this would be perceived as a graphics artifacts (ghosting/etc), highlighting the dissocluded areas, rather than an improvement.

Oh I wasn't talking about temporal accumulation, just that initial frame with newly disoccluded information has a rendering resolution lower than native, which even with screenspace AA is going to be noticeable, so you want to blow it up to output resolution.

Obviously AMD currently does this with their fancy Lanczos, but single image AI upscaling is a more promising long term route to image quality in this specific instance, and has no temporal accumulation as there's no history as of yet.

If you turn your neural net from a per material decompressor to a per game decompressor, the parameters and technique should start to see a lot of overlap between decompressing all these materials with one neural net, and a neural net that uprezzes a single input image, as both would be trained from increasingly similar data sets towards increasingly similar purposes.

It's a stretch of the original concept, but I can see all this going in that direction eventually, though I'm not sure if turning both the no history uprez net and decompression net into a single one would have much if any benefit it's still worthy of consideration.
 

Zen 5 delayed by 2 weeks to 15th Aug

Some additional info disclosed:
 
I know there's a lot of things that could cause a delay, but they were already sending out tons of review samples. I wonder if AMD is looking at the generally lackluster performance improvements and second guessing their decision to go with conservative TDP/clocks for sake of making Zen 5 look extra efficient. Zen 5 seems like it's gonna be especially underwhelming for gaming, and a clockspeed boost could maybe help mitigate this a bit in reviews.

Would be quite a drastic last second change, but should be possible. And given they haven't released prices or pre-orders or anything, it wouldn't be completely too late for it.
 
Steve suggests that they've decided to wait till Intel will push the m/c update as that one may affect Intel's performance (although I think this is somewhat unlikely and is basically a conspiracy theory).
 
Sounds like quality concerns:

We appreciate the excitement around Ryzen 9000 series processors. During final checks, we found the initial production units that were shipped to our channel partners did not meet our full quality expectations. Out of an abundance of caution and to maintain the highest quality experiences for every Ryzen user, we are working with our channel partners to replace the initial production units with fresh units.

As a result, there will be a short delay in retail availability. The Ryzen 7 9700X and Ryzen 5 9600X processors will now go on sale on August 8th and the Ryzen 9 9950X and Ryzen 9 9900X processors will go on sale on August 15th. We pride ourselves in providing a high-quality experience for every Ryzen user, and we look forward to our fans having a great experience with the new Ryzen 9000 series.

 
Last edited:
It's precall time!

Btw curios how will they check for the precalled batches for the issues they are concerned with.

So we as end user also able to check for issues after the precall. In case some were missed.
 
Steve suggests that they've decided to wait till Intel will push the m/c update as that one may affect Intel's performance (although I think this is somewhat unlikely and is basically a conspiracy theory).
Timelines don't match. Intel has said around mid August, all Ryzens will be on sale by 15th. If they'd wait for Intel they'd launch Ryzens towards end of August at the earliest.
 
I know there's a lot of things that could cause a delay, but they were already sending out tons of review samples. I wonder if AMD is looking at the generally lackluster performance improvements and second guessing their decision to go with conservative TDP/clocks for sake of making Zen 5 look extra efficient. Zen 5 seems like it's gonna be especially underwhelming for gaming, and a clockspeed boost could maybe help mitigate this a bit in reviews.

Would be quite a drastic last second change, but should be possible. And given they haven't released prices or pre-orders or anything, it wouldn't be completely too late for it.

Actually they hadn't sent out any review samples yet from what I've read. Many reviewers were also mentioning this recently as they'd have been hard pressed to complete testing before the earlier launch date of 31st July. Don't think its anything to do with spec changes, though there were rumours earlier that the 9700X may get an official TDP boost. Not that it matters as it's unlocked anyway.
Sounds like quality concerns:



Additional info from the verge - "This is not because AMD’s found any issues with the actual chips, spokesperson Stacy MacDiarmid tells The Verge. Rather, AMD discovered some of its chips didn’t go through all of the proper testing procedures, and the company wants to make sure they do"
It's precall time!

Btw curios how will they check for the precalled batches for the issues they are concerned with.

So we as end user also able to check for issues after the precall. In case some were missed.
The CPUs will go through QA again and only the CPUs that pass will be sold so shouldn't be an issue for consumers in any way.
Timelines don't match. Intel has said around mid August, all Ryzens will be on sale by 15th. If they'd wait for Intel they'd launch Ryzens towards end of August at the earliest.
Yea nothing to do with Intel. Overall this is a good move by AMD to ensure users do not face any issues.
 
Back
Top