AMD Execution Thread [2023]

Status
Not open for further replies.
If you want efficiency, you should see this. A 4060 desktop GPU beating a mobile RDNA3 GPU in efficiency by two times.
Interesting, but not sure if that's really right.
And it's likely unrelated to future APUs anyway, which was the topic.
And you think APUs will give you any more than 8GB of VRAM?
Yep. Strix Halo is a 16 core, so 32GB make more sense than 16GB, and it's still one year away.
(I also expect no big success here yet. Likely Strix Point gets more attention for now. Depends on how long it takes for people to change their minds. *if so*)
Personally i lack experience and opinion on how mach VRAM we need as a baseline. I have a hard time to believe 8GB VRAM are no longer enough, and i guess devs could adapt as needed id they had to.
Why are you even considering PT? APUs won't get you anywhere near RT performance. Especially AMD APUs. You are locked out of the entire ray tracing market, Lumen is not for you either, whether in software or hardware.
Hmm... i think you got me. The real reason i want APU is probably that my GI stuff will work on APU, while NV and Epic can't scale down. :D

But no. No, personally i really have enough from GPUs. They just suck. Any offer out there has a flaw. But for the costs they charge i expect perfection, not flaws.
And the same applies to PT or Lumen. Both sucks. Lumen is a right out failure waiting for urgent replacement, and PT is the swan song of a shrinking high end niche, diverging from a reasonable but currently missing mainstream.
I have enough, and i'm absolutely willing to forgo laggy and pulsating realtime GI attempts completely for a cheaper and smaller computer and a stable image at 'lower' IQ.
The glory dGPU era has ended with the GTX brand, my friend. What's left seems not here to stay.

That's what i honestly think as a gamer, but as always: Future will tell. I would not wonder if i turn out wrong with everything here.

I don't believe it is, It's probably more expensive than all of those combined.
If so, then it would be just a matter of time or production volume.
But Apple would be stupid to spend more for less, just to pretend to have made the most efficient chip on earth. It rather seems they really did it.
 
Is Strix Halo expected to use HBM or something? I'm not understanding why people are talking about VRAM for an APU.

No, on-package LPDDR5, just with a wide (256bit) bus. Just like Apple does with the M1. People are talking about "vram" because the APU removes that # as a limitation - all available memory is technically 'vram' with an APU, it doesn't doesn't mean much in current APU's because of the 64-bit bus to DDR5, they're so bandwidth constrained that utilizing high texture detail costs performance even if they technically have enough memory.

Is there a market for such super-expensive APU? I mean the economy sux.

Depends entirely if "super-expensive" means "more expensive than equivalent dedicated GPU solution", not to mention potential power savings. Gaming notebooks are still being sold.
 
No, on-package LPDDR5, just with a wide (256bit) bus. Just like Apple does with the M1. People are talking about "vram" because the APU removes that # as a limitation - all available memory is technically 'vram' with an APU, it doesn't doesn't mean much in current APU's because of the 64-bit bus to DDR5, they're so bandwidth constrained that utilizing high texture detail costs performance even if they technically have enough memory.
Oh I was thinking this would also be some desktop part too, as people keep talking about cutting out the low end discrete GPU segment in general. If it's a purely mobile part, I guess.

Is there a market for such super-expensive APU? I mean the economy sux.
The 'economy' is mostly doing quite fine. If you're talking about greedflation, well, it's not stopping rampant consumerism. Not even the pandemic could stop that. People are still spending money. Demand for gaming hardware is definitely still very high. People only stopped buying Nvidia GPU's cuz they've gone beyond basic greed and into total insanity.
 
Oh I was thinking this would also be some desktop part too, as people keep talking about cutting out the low end discrete GPU segment in general. If it's a purely mobile part, I guess.

I'm sure it could eventually be a desktop part, aside from it being used in NUC-like systems at the start too (which would be an awesome config, again though depends on the price).
 
I'm sure it could eventually be a desktop part, aside from it being used in NUC-like systems at the start too (which would be an awesome config, again though depends on the price).
Theoretically yes, but on-package LPDDR5 for a desktop part would make for a very unusual product. How do you sell that? Only in special prebuilt models or something?
 
Theoretically yes, but on-package LPDDR5 for a desktop part would make for a very unusual product. How do you sell that? Only in special prebuilt models or something?

Yeah, OEM SFF systems - they would be a niche of course. Frankly I think powerful APU's would/will be a positive force for PC gaming, if not solely for the fact it's one less thing for OEM's to fuck-up. Pre-built gaming PC's are usually a disaster, having almost everything supplied on one chip just gives oem's less places to cut corners.

Expect to see more CPU designs with on-package ram coming in the future, even without powerful GPU's - it's certainly what Intel and AMD are planning.
 
Is Strix Halo expected to use HBM or something? I'm not understanding why people are talking about VRAM for an APU.
LPDDR5X@8533.
Is there a market for such super-expensive APU?
AMD's not kicking off a new swimlane without some decent-ish OEM interest.
So, obviously, yes.
Expect to see more CPU designs with on-package ram coming in the future, even without powerful GPU's - it's certainly what Intel and AMD are planning
CAMM allows for modular LPDDR so hold your horses about on-package anything outside of 10W M1 compete parts a-la Lunar Lake-M.
 
CAMM allows for modular LPDDR so hold your horses about on-package anything outside of 10W M1 compete parts a-la Lunar Lake-M.

My understanding of CAMM was that it was primarily a space saving efficiency that Dell was trying to get standardized, how does it scale to wide busses?
 
PT is not efficient because it lacks a radiance cache, thus you need to recalculate all bounces every frame.
If you have a cache, you get infinite bounces for free. Examples: Some pre RTX Minecraft mod which was great, but called it self 'PT'. Final Metro Exodus DXR, which also calls itself PT but isn't. And Lumen.
For a realtime application it is just stupid to miss out on this essential optimization, and that's why classical PT is no good realtime solution.
NV also aims to add caching, working on Neural Radiance Cache.

That said, we will continue to use the PT term for games, but we will extend it and diverge from the classical definition.
Classical PT as is is not the solution. If you think so you're just wrong, or you're willing to spend more than needed to achieve photorealism.

And Pathtracing has been optimized for real time applications. You can change the traditional rasterizing direct lightning model in UE5 to something like nVidia's RTXDI and have a less than 50% performance lost.

Todays problem is rasterizing and low end systems like a PS5. With only 10 TFLOPs there is nothing you can do. You can not switch to modern rendering and you must handcoded everything to 10 TFLOPs which restrict portability. Like i said: My 4090 has 9x more compute performance but in PS5 ports it is maybe 3x faster. Yet you can run Cyberpunk with Pathtracing in 1080p with 60FPS+, Portal RTX in 1440p with 60FPS and UE5 games in 1440p with 60FPS (Lumen + RTXDI).

So for me its strange when people complain about Raytracing and Pathtracing when image quality is holding back by low end systems and an unoptimized software stack. UE5 for example is so bad that FrameGeneration can render a native frame nearly as fast as DLSS performance but with much better image quality. And i still wonder why UE5 doesnt scale better with a 90 TFLOPs GPU when Nanite rendering happens with compute without the traditional geometry pipeline...
 
Last edited:
Yeah, OEM SFF systems - they would be a niche of course. Frankly I think powerful APU's would/will be a positive force for PC gaming, if not solely for the fact it's one less thing for OEM's to fuck-up. Pre-built gaming PC's are usually a disaster, having almost everything supplied on one chip just gives oem's less places to cut corners.

Expect to see more CPU designs with on-package ram coming in the future, even without powerful GPU's - it's certainly what Intel and AMD are planning.
If it's only niche, and we're talking sort of custom SFF PC's, I'd worry about the cost issue. I mean, if you can build a faster traditional PC for the same money, then we're dwindling the advantage down to almost just form factor alone, while also losing all ability to upgrade CPU, memory or GPU in the future which definitely has value.

Maybe Valve could try and bring back the 'Steam Machine', except do what they did with Steam Deck and subsidize the cost as much as practically possible in order to push adoption? That could maybe help get around lower volume cost issues.
 
And Pathtracing has been optimized for real time applications. You can change the traditional rasterizing direct lightning model in UE5 to something like nVidia's RTXDI and have a less than 50% performance lost.
Yeah, but being pedantic about terminology, i would say that NVs extensions to PT turn it into another algorithm. Not totally sure about Restir, but iirc it's caching and reusing samples temporally, but in traditional PT there is no caching. That's what it separates from other correct methods and also what makes it so easy to use. It's a big reason of why PT has won the movie industry.
(Basically i refuse to modify the traditional definition of PT from recent realtime tech bending this definition.)

Todays problem is rasterizing and low end systems like a PS5. With only 10 TFLOPs there is nothing you can do.
No. 10tf is plenty. If i want just realtime GI and have no other crazy ambitions 2tf of a PS4 is already enough for 60fps.
That's using my own method, which is quite accurate globally, but lacks high frequency details like hard shadows or sharp reflections.
That's really why i think APU is totally practical, since HF detail is optional. And a 10tf machine can add those details already quite well.
But i do agree that current handhelds like SteamDeck are too weak with GPU, and cause more of a problem than an opportunity. I praise them only in hope they pave the way towards something better.
You can not switch to modern rendering and you must handcoded everything to 10 TFLOPs which restrict portability.
I can. I can use my method as a cache on high end machines, and increase accuracy by running PT on top to handle the first bounce(s). (At least if DXR could support the LOD of my geometry)
This should work very well and perceptual error should approach zero pretty quickly. The extra work is then to spend on the PT branch, not on the base GI system which we need anyway. Ofc. such base system needs to be accurate enough. Crap like Voxel Cone Tracing would not do.
This sounds pretty good, but my problems are: Preprocessing costs during production, and a complex base GI system. Both is surely much harder to maintain than current state of the art. (Excluding Lumen, which seems much worse.)
And i'm also not too optimistic that my stuff would scale down to mobile HW.

So for me its strange when people complain about Raytracing and Pathtracing when image quality is holding back by low end systems
Yes, but if my stuff works out, this problem would be gone. Scaling up is easy, and there is no more holding back from low end.

And i still wonder why UE5 doesnt scale better with a 90 TFLOPs GPU when Nanite rendering happens with compute without the traditional geometry pipeline...
I also wonder why it does so badly on my old 10tf Vega56. No matter if it's a huge Matrix city demo or a small indie game in a single low detail apartment, i get the same 15-20 fps (at some standard 1080p).
So i conclude Nanite is not the problem, but Lumen is. Considering what a patchwork of hacks this is, i do not wonder it can't saturate a 90tf GPU. But that's just a guess.

However, remembering profiler outputs of Portal RTX shown somewhere here, PT is no good way to saturate either. Too many registers for long pipelines, too much divergence everywhere.
You need the big GPU to run it smoothly, but the GPU isn't saturated well. That's really a problem and kinda sad even from the enthusiast perspective.

Ofc. we can judge that it's just worth it. But if you ask me what to do with 90tf and huge VRAM, i would say large and detailed fluid simulations. This would be awesome, and enough brute force work to make the GPU glow.
But that's just a dream. This could not scale down, so it's not really interesting in practice.
We might get it if 90tf is indeed the future standard. I would not complain. I just can't believe it currently. ;)
 
Maybe I'm late, but the 7800xt specs suggest it is going to perform roughly the same as the 6800xt. Ummmm what?
AMD messed up badly with the naming here, trying to do a similar thing as Nvidia by pushing up lower end parts to higher tier naming in order to charge more money, though not quite as extreme. Still, AMD cant get away with this kind of thing like Nvidia can try to. They just dont have the same clout or extra features or anything to try and make an argument for it.

But yea, fully enabled Navi 32 should ideally be called 7700XT. Or 7800 non-XT at worst. With 7800XT naming, I expect around a $600 pricetag, which will only represent a very minor performance per dollar improvement from 6800XT. $550 could make it slightly more appealing, but I dont think AMD will do that.

TBH I'm more interested in how 7700XT will perform.
And that's another problem. In the past, AMD used to be a bit generous and only really cut down the cores for cut down models, leaving full RAM and bandwidth capabilities intact. I guess they realized this was too good a deal and so now products like 7900XT and 7700XT also come with reduced memory bus, with less memory and less bandwidth.

AMD thinks they're just making the higher end part more appealing, while I feel they're just doing more damage by making the more affordable options less desirable.

I expect a ~52CU 7700XT to perform similar to a 6800 for about $500, again representing a minimal improvement in performance per dollar from previous gen.
 
I expect a ~52CU 7700XT to perform similar to a 6800 for about $500, again representing a minimal improvement in performance per dollar from previous gen.
That would mean having 3 different models within $150, seems a bit tight? While 7800 GRE isn't available in retail outside china, it does have official MSRP of $649
 
Status
Not open for further replies.
Back
Top