AMD: RDNA 3 Speculation, Rumours and Discussion

DavidGraham · Sep 21, 2021

The same talk 10 years ago, APUs is going to eat up the market, dGPUs will disappear .. etc, etc .. none of which has materialized in any shape or form. I see the same hogwash here. If the cost is going to make 300$ GPUs disappear, then APUs are hopelessly screwed.

Bondrewd · Sep 21, 2021

DavidGraham said:
I see the same hogwash here

You can't see anything because you can't even read some semiwiki/semieng shit or idk.
You don't need 6D brains to understand the basic gist of "shit's fucked, yo" and "only the enfattened segments which can tolerate the costs will survive".
Or at least I hope you don't.

DavidGraham said:
then APUs are hopelessly screwed.

They have the volumes and some very, utterly loyal OEM slaves to survive.
Your laptop is still gonna be more expensive.

DavidGraham · Sep 21, 2021

Bondrewd said:
They have the volumes and some very, utterly loyal OEM slaves to survive.
Your laptop is still gonna be more expensiv

Same thing applies to 300$ dGPUs.

Bondrewd · Sep 21, 2021

DavidGraham said:
Same thing applies to 300$ dGPUs.

Meagre volumes and AIC vendors are choking at stuff like 3060@12GB already.
Sorry bro, shit's fucked.

eastmen · Sep 21, 2021

Esrever said:
If we are looking at a normally price GPU market, the best APUs are still only comparable to $50 GPU. I don't see this changing significantly enough to in the future. APUs are as expensive as GPU/CPUs to manufacture and have to be sold for less margin due to performance constraints.

If someone is trying to make a cheap build then the used market would most likely fill the gap between apu performance and $300ish . The jump between an apu to dedicated sub $300 gpu isn't that large.

Also most issues are with memory bandwidth on the apu side. So what happens when you start getting apus with infinity cache ?

Kaotik · Sep 21, 2021

eastmen said:
If someone is trying to make a cheap build then the used market would most likely fill the gap between apu performance and $300ish . The jump between an apu to dedicated sub $300 gpu isn't that large.

Also most issues are with memory bandwidth on the apu side. So what happens when you start getting apus with infinity cache ?

More importantly, all CPUs having GPU with AM5 it gives even less incentive to fill that lowend niche

eastmen · Sep 22, 2021

Kaotik said:
More importantly, all CPUs having GPU with AM5 it gives even less incentive to fill that lowend niche

Yea the only way I can see a sub $300 market existing is if its just old chips they are trying to dump

neckthrough · Sep 22, 2021

DavidGraham said:
The same talk 10 years ago, APUs is going to eat up the market, dGPUs will disappear .. etc, etc .. none of which has materialized in any shape or form. I see the same hogwash here. If the cost is going to make 300$ GPUs disappear, then APUs are hopelessly screwed.

I can't comment on APUs eating up the market. They are more economical to produce (amortized packaging costs, no exotic DRAM) but vendor-OEM calculus is outside of my pay grade.

What I do know is that Moore's law's ability to reduce $/transistor is over (at least at the pace we're used to). There's still some power scaling but that's been tapering off too. Chips are all wires now and wires don't scale. So we're left with transistor density scaling, but you have to pay for the transistors in $, W and heat dissipation (which means more $). And bandwidth to feed those transistors, which is its own story.

In theory it's not all doom and gloom -- algorithmic and architectural innovations may spark off a new innovation cadence. But the industry is still adapting, with development cadences still grasping on to the last vestiges of Moore's law. It's all still a "moar cores" mindset. It's going to take a while to change, and until then we'll feel the costs.

I hope I'm proven wrong.

Bondrewd · Sep 22, 2021

neckthrough said:
It's all still a "moar cores" mindset

GPUs are inherently that.
We've been at like 550-ish average high-end GPU die since 2006 or so.

neckthrough said:
algorithmic and architectural innovations may spark off a new innovation cadence

Well that's just bullshit; all that stuff needs them sweet sweet xtors and man are xtors not coming cheap those days.

neckthrough said:
I hope I'm proven wrong.

"We're lowkey fucked" is an industry-wide observation with like a bazillion articles written about it to day.

Jawed · Sep 22, 2021

CPUs show us that the stuff that keeps the core working, instead of waiting, is what brings better efficiency. I believe RDNA 3 is about yet more of this, rather than more cores.

Sure, there's the rumoured triple-chiplet monstrosity that is Navi 31 (and 32) which seemingly goes for more cache as well as more cores, but I think AMD is aiming to implement fine-grained kernel-spawning for hand-off and function-calling using task pooling and queues within each WGP:

HARDWARE ACCELERATED DYNAMIC WORK CREATION ON A GRAPHICS PROCESSING UNIT - ADVANCED MICRO DEVICES, INC. (freepatentsonline.com)

AGGREGATED DOORBELLS FOR UNMAPPED QUEUES IN A GRAPHICS PROCESSING UNIT - ADVANCED MICRO DEVICES, INC. (freepatentsonline.com)

Register saving for function calling - Advanced Micro Devices, Inc. (freepatentsonline.com)

TECHNIQUES FOR IMPROVING OPERAND CACHING - Advanced Micro Devices, Inc. (freepatentsonline.com)

I think a major motivation here is ray tracing, because the function hierarchy, even in real time gaming graphics, is really tricky. This ties in with the continuation-passing style of sub-function control that I briefly referenced here:

Intel Xe Ray Tracing | Beyond3D Forum

where we can see that Intel is implementing the same concepts.

So that appears specific to ray tracing. I think it can be more widely used for conditional routing techniques for reducing the impact of control flow divergence. It could be said that this last part is a nice to have, because nesting/looping rapidly makes a mess.

So I think coarse-grained control flow resulting from ray traversal (miss? hit? material? spawn-ray?) looks amenable to intra-WGP task pooling and scheduling and it might have wider usage.

I do wonder if the next iteration of D3D12 includes some fine-grained shader-calling-shader functionality.

Bondrewd · Sep 22, 2021

Jawed said:
CPUs show us that the stuff that keeps the core working, instead of waiting, is what brings better efficiency

CPUs aren't GPUs.

Jawed said:
is what brings better efficiency

Well yes, but also no, see DC chip power creep and correlate it to the ever-increasing CC.
Everyone wants more compute and whatever remains of Moore's only has so much to give.

Jawed said:
I believe RDNA 3 is about yet more of this, rather than more cores.

Yes but it's also way way more stuff on the higher end!

Jawed said:
Sure, there's the rumoured triple-chiplet

Triple?
It's fuckton of tiles.
Two GCDs, yes.

neckthrough · Sep 22, 2021

Bondrewd said:
Well that's just bullshit; all that stuff needs them sweet sweet xtors and man are xtors not coming cheap those days.

Yeah well it's not quite the golden goose that keeps on giving like Moore's law did. But for many workloads there's up to 10x upside without more xtors. That's for the kernels, and then there's a ton of inefficiency in the software stacks stitching those kernels together. This is all pure research, I don't know whether or in what shape any of this will make its way into the market. I also don't know where we go once we've mined out all this inefficiency.

Tofu · Sep 28, 2021

N31 seems to come later and later

CES2023 is the latest pre-announcement estimate.

https://twitter.com/x/status/1442717110984056840

Bondrewd · Sep 28, 2021

Tofu said:
N31 seems to come later and later

Nah.

Tofu said:
CES2023 is the latest pre-announcement estimate.

Nah.

Albuquerque · Sep 28, 2021

I feel it's worth reminding a few folks: Moore's Observation (aka "Law") was not about $ per transistor, nor overall transistor density, nor any power or performance or compute capability metric either. Rather, his observation was about the total number of transistors in an integrated ciruit roughly doubling every two years. It's not your fault if you weren't aware of this, a LOT of news outlets and bloggers and vloggers and forum participants echo the same set of misinformation about it being performance or price or compute capaibility some combination of the three. Despite these three things being resultant from decades of silicon lithography evolution, precisely none of those three were in Moore's original observation.

We' can continue crushing more transistors into a singular IC even to this day, yet most of the the doubling is coming from physical layout -- we're capable of building physically larger chips now, and we have some work being done on silicon stacking, both achieve the same result of more transistors in a single IC. Unfortunately the cost of doing so is now increasing significantly and the power / performance / sizing gain is paltry at best, bordering on non-existent now.

Moore's Law is Transistors per IC, not transistors per inch, not transistors per dollar, not instructions per cycle, not cycles per second, not cycles per unit of power, not instructions per unit of power.

DegustatoR · Sep 28, 2021

Albuquerque said:
and we have some work being done on silicon stacking

Not sure that this can be seen as something what Moore was talking about.

Albuquerque · Sep 28, 2021

DegustatoR said:
Not sure that this can be seen as something what Moore was talking about.

This shouldn't be some obscure conjecture on whether Moore could've reasonably forseen ICs being constructed in more than a two dimensional plane. Rather, we get into some pedanticism around drawing a line to declare where a single integrated circuit ends and a new one begins; can a singular IC only exist in a flat 2D plane? What happens to a future possible state where we can legitemately build an integrated circuit in all three dimesions, say some future 3D printing technology?

I get that irregularly stacked chips like multiple discrete chiplets stacked (mounted?) on a singular underlying substrate are different. Howabout when TSVs are involved where the multiple layers of silicon interoperate and are not otherwise able to be made functional when standing alone as a singular layer? A singular chiplet can be made functional outside of the MCM substrate.

Esrever · Sep 28, 2021

Albuquerque said:
Moore's Law is Transistors per IC, not transistors per inch, not transistors per dollar, not instructions per cycle, not cycles per second, not cycles per unit of power, not instructions per unit of power.

Moore was originally looking at optimum cost of transistors for a process and the cost benefits it brings as a smaller process matures. Notice the clear "Cost" part of the graph?

Deleted member 90741 · Sep 28, 2021

ethernity said:
Another patent application for concurrent traversal of the BVH tree
https://www.freepatentsonline.com/y2021/0209832.html

View attachment 5665
View attachment 5666

A couple of new patent applications related to RT optimizations

PARTIALLY RESIDENT BOUNDING VOLUME HIERARCHY

Techniques for performing ray tracing for a ray are provided. The techniques include, based on first traversal of a bounding volume hierarchy, identifying a first memory page that is classified as resident, obtaining a first portion of the bounding volume hierarchy associated with the first memory page, traversing the first portion of the bounding volume hierarchy according to a ray intersection test, based on second traversal of the bounding volume hierarchy, identifying a second memory page that is classified as valid and non-resident, and in response to the second memory page being classified as valid and non-resident, determining that a miss occurs for each node of the bounding volume hierarchy within the second memory page.

RAY-TRACING MULTI-SAMPLE ANTI-ALIASING

A technique for performing a ray tracing operation for a ray is provided. The method includes performing one or more ray-box intersection tests for the ray against one or more bounding boxes of a bounding volume hierarchy to eliminate one or more nodes of the bounding volume hierarchy from consideration, for one or more triangles of the bounding volume hierarchy that are not eliminated by the one or more ray-box intersection tests, performing one or more ray-triangle intersection tests utilizing samples displaced from a centroid position of the ray, and invoking one or more shaders of a ray tracing pipeline for the samples based on results of the ray-triangle intersection tests.

First patent seems to aim at traversing the BVH tree even when a part of the BVH data is missing in cache with the hope of getting a hit within the resident data, at least while the missing data is still being fetched
Any idea what the second patent is doing?

Albuquerque · Sep 28, 2021

Esrever said:
Moore was originally looking at optimum cost of transistors for a process and the cost benefits it brings as a smaller process matures. Notice the clear "Cost" part of the graph?

That's all fine and nice.

His observation was very specifically transistors per integrated circuit. You can use his observation to infer a lot of other things which were true at that time, and yet are not true now even though his observation of transistors per singular IC is still reasonably tracking to current.

AMD: RDNA 3 Speculation, Rumours and Discussion

DavidGraham

Bondrewd

DavidGraham

Bondrewd

eastmen

Kaotik

Drunk Member

eastmen

neckthrough

Bondrewd

Jawed

Bondrewd

neckthrough

Tofu

Bondrewd

Albuquerque

Red-headed step child

DegustatoR

Albuquerque

Red-headed step child

Esrever

Deleted member 90741

Guest

Albuquerque

Red-headed step child

Similar threads