AMD: RDNA 3 Speculation, Rumours and Discussion

xpea · Apr 27, 2022

techuse said:
6 months typically from tape out to release?

Usually tapeout is when the design is completed and you upload the files to the foundry FTP/portal/cloud.
Then the foundry generates the photomasks and tools to build the silicon. For a 7 die chiplet, with extra time needed for interconnect and packaging, I expect 10 to 12 weeks before the customer gets the silicon back from the package factory (Amkor, ASE, etc)
Then qualification will take at least another 6 months.
I think one year from tape out to HVM is realistic if taking into account one minor revision (metal mask)

Edit: Lately, Nvidia has been extremely efficient and fast from tape out to HVM. Most of their first silicon revision (A0) went into HVM

trinibwoy · Apr 27, 2022

xpea said:
I think one year from tape out to HVM is realistic if taking into account one minor revision (metal mask)

That would mean N31 is right on time.

TopSpoiler · Apr 28, 2022

xpea said:
Lately, Nvidia has been extremely efficient and fast from tape out to HVM. Most of their first silicon revision (A0) went into HVM

Does it relevant with this?
https://www.hpcwire.com/2022/04/18/nvidia-rd-chief-on-how-ai-is-improving-chip-design/

xpea · Apr 28, 2022

TopSpoiler said:
Does it relevant with this?
https://www.hpcwire.com/2022/04/18/nvidia-rd-chief-on-how-ai-is-improving-chip-design/

Yes and more things that they don't disclosure in public. The obvious one is the usage of their Selene supercomputer for intensive algorithm/RTL/floor plan verification. Nvidia spend much more time than before in simulation and their silicon verification lab is now a huge department in order to shorten tape out to HVM time. For small dies like GA106-107, tape out to HVM was less than 5 months and it should be even less for Ada

Granath · Apr 30, 2022

https://twitter.com/x/status/1520276092879196160

JasonLD · Apr 30, 2022

Expecting both N31 and AD102 to be less than 2x performance of current top end cards. Hype is getting beyond ridiculous at this point.

PSman1700 · Apr 30, 2022

Amazing performance ahead.

Granath · Apr 30, 2022

https://twitter.com/x/status/1520321392062828544

xpea · Apr 30, 2022

Granath said:
https://twitter.com/x/status/1520321392062828544

They talk like AMD (or Nvidia for the matter) can change the hardware after tape out. These leakers should learn a bit how an ASIC is designed and when an uarch is frozen. It will help them to improve their lies

Nebuchadnezzar · Apr 30, 2022

xpea said:
They talk like AMD (or Nvidia for the matter) can change the hardware after tape out. These leakers should learn a bit how an ASIC is designed and when an uarch is frozen. It will help them to improve their lies

That's just daft assessment. Most of these leaks come from firmware level / software leaks / vendor leaks and the information is only as good as AMD gives it at some point in time, what is leaked a month ago might be year old information and out of date. As time approaches to release more material things come out.

trinibwoy · Apr 30, 2022

https://twitter.com/x/status/1520256804298629124

6 SEs and double wide CUs claimed for Navi31. This would put a single GCD at 15360 FP32 per clock.

I like how both AD102 and Navi 31 “got stronger” in a matter of days. They probably ate their spinach.

Actually that would be 30720 FP32. Maybe it’s 3 SEs per GCD and 6 SEs per package. Big numbers.

Jawed · Apr 30, 2022

Four SIMDs sharing a single TMU/RA implies AMD thinks ray acceleration is fast enough and ray tracing performance is now down solely to traversal and hit/miss shader throughput.

Well, that assumes RA hasn't been changed and that traversal is still in software...

TopSpoiler · Apr 30, 2022

Jawed said:
Well, that assumes RA hasn't been changed and that traversal is still in software...

+1. There weren't notable patents published about traversal hardware for RDNA in past 2 years.

trinibwoy · Apr 30, 2022

Jawed said:
Four SIMDs sharing a single TMU/RA implies AMD thinks ray acceleration is fast enough and ray tracing performance is now down solely to traversal and hit/miss shader throughput.

Well, that assumes RA hasn't been changed and that traversal is still in software...

More SIMDs isn’t a solution for divergence though. If AMD continues to run traversal on the SIMDs they must not be serious about RT performance or they’re also doing some funky sorting in software.

TopSpoiler · Apr 30, 2022

xpea said:
They talk like AMD (or Nvidia for the matter) can change the hardware after tape out. These leakers should learn a bit how an ASIC is designed and when an uarch is frozen. It will help them to improve their lies

https://twitter.com/x/status/1496335011091722240

I barely trust him.

PSman1700 · Apr 30, 2022

TopSpoiler said:
I barely trust him.

Sums up leakers in general.

trinibwoy · Apr 30, 2022

TopSpoiler said:
+1. There weren't notable patents published about traversal hardware for RDNA in past 2 years.

Nothing new from Nvidia either aside from the AI stuff.

From AMDs patent on the design of the ray accelerator: "One purpose of using a merged data path unit is to reduce the amount of silicon area that is used by only a single type of instruction, because doing so reduces the total amount of silicon for a chip. This particular merged data path unit is capable of outputting results for four box tests per cycle or one triangle test per cycle."

So the RA isn't just re-using the TMU L1 memory pipeline. It's also sharing silicon between the box and triangle intersection logic. Very elegant and area efficient but one triangle per clock probably isn't going to cut it for RDNA 3.

Jawed · Apr 30, 2022

trinibwoy said:
More SIMDs isn’t a solution for divergence though.

Divergence literally lengthens the wall-clock time of a shader. If a WGP (or CU, as the rumour suggests CUs are still a thing in RDNA 3) provides for more shader cycles per RA cycle, then that reduces the walk clock time of divergent shaders.

This is similar to ALU:TEX ratio. Over time that ratio has increased.

If AMD continues to run traversal on the SIMDs they must not be serious about RT performance or they’re also doing some funky sorting in software.

So if AMD is doubling-down on software traversal it's reasonable to expect that more compute is matched by better local resources to increase the "traversal work-items per clock", such as a bigger LDS, bigger register file, higher-capacity crossbars (or ring-bus?).

The side-effect of "increased traversal WIPC" is that it helps with hit/miss shader WIPC too, since those shaders are also the victims of divergence. All GPUs have this problem and sorting is part of the solution.

Of course AMD might do nothing in terms of local resources to help with RT WIPC - such changes might be years off. The fact that patent documents for hardware traversal haven't appeared as yet seems to indicate it's off the cards for a very long time.

trinibwoy · Apr 30, 2022

Jawed said:
Divergence literally lengthens the wall-clock time of a shader. If a WGP (or CU, as the rumour suggests CUs are still a thing in RDNA 3) provides for more shader cycles per RA cycle, then that reduces the walk clock time of divergent shaders.

Sure but best case it reduces that time 2x. Worst case for divergence is 32x.

So if AMD is doubling-down on software traversal it's reasonable to expect that more compute is matched by better local resources to increase the "traversal work-items per clock", such as a bigger LDS, bigger register file, higher-capacity crossbars (or ring-bus?).

The side-effect of "increased traversal WIPC" is that it helps with hit/miss shader WIPC too, since those shaders are also the victims of divergence. All GPUs have this problem and sorting is part of the solution.

Of course AMD might do nothing in terms of local resources to help with RT WIPC - such changes might be years off. The fact that patent documents for hardware traversal haven't appeared as yet seems to indicate it's off the cards for a very long time.

Right, AMD needs to solve for both traversal divergence and shading divergence. Intel and Nvidia are tackling the former by avoiding SIMD altogether. Willl be interesting to see how AMD's gamble works out on next generation RT workloads. They don't have enough control over the market to slow down RT adoption so hopefully that's not their game plan.

TopSpoiler · Apr 30, 2022

trinibwoy said:
Nothing new from Nvidia either aside from the AI stuff.

Really? You may have seen these in another thread. Or you thinking these (first two) are already in Ampere?
https://www.freepatentsonline.com/11295508.html
https://www.freepatentsonline.com/11282261.html
https://www.freepatentsonline.com/y2022/0027194.html

AMD: RDNA 3 Speculation, Rumours and Discussion

xpea

trinibwoy

Meh

TopSpoiler

xpea

Granath

JasonLD

PSman1700

Granath

xpea

Nebuchadnezzar

trinibwoy

Meh

Jawed

TopSpoiler

trinibwoy

Meh

TopSpoiler

PSman1700

trinibwoy

Meh

Jawed

trinibwoy

Meh

TopSpoiler

Similar threads