AMD Execution Thread [2023]

Status
Not open for further replies.
Do they not simulate these chips in software first? They would be crazy to design a multi GCD architecture without validating the concept in software many years earlier.
Disclaimer, I don't know anything about RDNA4 story, but I can say that software support + supply chain / production limitation are some reasons for delaying Nvidia MCM client GPU (along other things like monolithic is still better for the vast majority of the product range, don't forget we are talking about high-end with MCM). Generally speaking, hardware design is not an issue, green team is using CoWos and HBM in their DC lineup for 3 generations now...
Knowing that and back to AMD, it seems very reasonable that the software was the reason to cancel this top SKU multi GCD design. Making development plan and proof of concept are easy but supporting corner cases of API subtilities and software oddities take a huge amount of validation effort. And it's well know that AMD don't have the resources to innovate faster than Nvidia in software. Just look at the recent FSR3 debacle for a good example of promises and delays. I'm not surprised that AMD managers are not feeling confident in getting software at the same pace than the hardware...
 
green team is using CoWos and HBM in their DC lineup for 3 generations now...
CoWoS is simple as dung and they're not using the more interesting varieties (-R or -L).
They haven't even touched SoIC (and won't for years, way too out of their hweng scope).
it seems very reasonable that the software was the reason to cancel this top SKU multi GCD design
no.
but supporting corner cases of API subtilities and software oddities take a huge amount of validation effort
No, making sure hardware does everything correctly is the hard part.
 
CoWoS is simple as dung and they're not using the more interesting varieties (-R or -L).
They haven't even touched SoIC (and won't for years, way too out of their hweng scope).

no.

No, making sure hardware does everything correctly is the hard part.
Today CoWos is easy and matured but it was not the case when NV first introduced it. And this narrative that NV is behind AMD in complex packaging design is hilarious and false. Yes AMD has products in the field using more advanced packaging but it's simply because NV doesn't need it (yet) to win. NV is working closely with TSMC and Samsung in development and validation in all packaging techs and they have proof of concept samples for all these techs in their silicon lab. In other words, AMD doesn't have any proprietary packaging tech, nor production facility, it's all coming from TSMC. What AMD can do, NV can do it at the same timeframe if they decide to.
Regarding software vs hardware difficulty, I totally disagree. Multi GDC is basically "only" a matter of bandwidth between the dies and of course cost increase (because of data locality). Solutions are well know at this point. Make it work without huge cost increase is the difficult part...
 
This level of juvenile engagement will not be tolerated.
oh no "they don't ship it cuz they don't need it" copium, great.
not like AMD ever needed to ship X3D.
but it was not the case when NV first introduced it
Xilinx has been shipping CoWoS since 2012.
lol
 
When referring to multi compute and graphics dies working as a single instance, are we talking professional or consumer? I don't believe for a second AMD has or will have it working efficiently for games in the next few years.
 
1. oh no "they don't ship it cuz they don't need it" copium, great.
not like AMD ever needed to ship X3D.

2. Xilinx has been shipping CoWoS since 2012.
lol
1. Sales figures strongly disagree with you. Get Copium yourself
2. For any packaging tech, production volume was always the limitation factor and still is right now (inc. TSV tools for HBM and substract supply). Niche products can go into mini batches early because of low volume... Sad that I have to mention that here in order to counter non sense arguments...
lol too I guess :rolleyes:
 
or consumer?
Consumer.
Sales figures strongly disagree with you
wat.
AMD never need to ship X3D to maintain Milan/Genoa sales; they're just cute little parts/technology demonstrators.
If NV could pull something like that, they would.
But they can't.
all they do is posting paperware about their future 2.5D links (even though that was a ~year ago).
Everyone else is shipping, even Intel (MDIF's are eh but still counts).
For any packaging tech, production volume was always the limitation factor and still is right now
CoWoS starts are still a rounding error in TSMC output and will forever be so unless used in mainstream mobile/laptop (just, no, FOWLP city forever); they're just used for higher-margin products like yes, FPGAs and GPUs.
Which is why there's a capacity crunch now, high-end GPUs became slightly less meme volume.
Niche products can go into mini batches early because of low volume...
Yes like GPGPU.
Low volume, high margin parts.
 
Last edited:
I don't even know why I reply...
30k wafers is basically nothing and even H100 bubble-numbers are a rounding error versus phone/laptop volumes.
Just stop trying, you know you have no argument besides saying "nvidia best".
If Nvidia best, they would ship SoIC.
They can't, hence why AMD best. got it?
but forget to mention $10B revenue this quarter
Yes, when you sell things with 45k ASP a pop, you earn a lot.
Even Intel will dip that shit with Gaudi 2.
which is more than 2 times DC revenue of AMD+INNTC combined
Units.
Units is what counts with advanced packaging flow.
Intel took years to turn EMIB from a niche FPGA packaging addition to something more mainstream-ish (SPR will hit 1m units this Q).
Then there's MTL.
FOWLP is the most mainstream adv pkg flow and will remain so forever, since well, ships in mobile and kinda a lot.
 
Last edited:
30k wafers is basically nothing and even H100 bubble-numbers are a rounding error versus phone/laptop volumes.
Just stop trying, you know you have no argument besides saying "nvidia best".
If Nvidia best, they would ship SoIC.
They can't, hence why AMD best. got it?
you are the best advocate to counter your own arguments! Following your own logic, if H100 is a "rounding error", then AMD Instinct doesn't even exist. So why do you talk about it? Kindergarten "AMD best" rumble is all but theory and intentions. Companies are judged by real hard money results not by R&D effort. Wake me up when AMD will be more than a "rounding error"
 
then AMD Instinct doesn't even exist.
Duhhh.
MI300 does fancypants packaging to this degree only because the final volumes are meme, yet ASPs are sky-high.
Kindergarten "AMD best" rumble is all but theory and intentions
It's a product with SoIC, just like Milan-X or Genoa-X or Vermeer-X or Raphael-X.
Companies are judged by real hard money results not by R&D effort
what the.
Does this mean BK-era Intel was da best?
 
Why couldn't they present multiple GCD's to the operating system as one GPU?

CPU's are much more complex and yet AMD have managed exactly this with CPU's.

Also pretty sure MI300 is not going to be viewed by the OS as multiple GPU's.

I wouldn't think the issue is actually presenting it as one GPU but the performance considerations that would need to be factored in to do so.

There is for instance a pretty significant latency penalty (and bandwidth as well) if communicating across chiplets on AMD's CPUs and an associated performance penalty that manifests if the workload does cause that.

That will need to be mitigated in hardware or in software (such as with Explicit Multi GPU), or a combination of both.
 
Those are tiny organic links, not something GPUs would ever use.

That would be an example of a hardware solution to mitigate the issue. It also shows there is more considerations in moving to GPU chiplets compared to what we have currently have with CPUs.
 
'Upper 200s' CUs, on a 500 mm2 die? o_O

200+ CUs runnning at 3.2+ GHz should perform at ~3000 GT/s and ~200 TFLOPS, or 3x times the RX 7900 XTX (if counting 4 ops per clock, as in dual-issue FP32 ALUs in RDNA3); and ~100 CUs would be 25% faster.
That's great but I don't think I'd believe those CU counts even for RDNA5 unless they had massive arch changes in between but a massively complex die/chiplet approach fits the bill. Low end increase of 6x CUs in 2 gens (16 CU 6500XT -> 96 CU 8500) having it match a last gen flagship and a >2.5x flagship increase in one gen sounds borderline fanfiction. I'd love to see a few generations of progress that'd make the 90s/00s feel slow in comparison but healthy scepticism until it materialises

However let's entertain the idea for RDNA5 (?), something small and very scalable like 32-40CUs per SED, >=1TB/s bridges connecting the AIDs (>=2TB/s bi-directional)? How would binning and overclocking work? Try to match each part from the same bin as close as you can or restrict each part to the lowest common denominator so they all run the same speed? Independant AIDs/sets of SED bins that can run at different speeds? Completing work at different rates across each AID/SED sounds like something that might throw random errors because it wasn't expected. Would hardware monitors would report all frequencies like CPUs? In reality it'll be handled like CPUs, they'll work seamlessly in the background (if it works as intended) both collectively and independently to an extent and I've been overthinking it
 
Last edited by a moderator:
That's great but I don't think I'd believe those CU counts even for RDNA5 unless they had massive arch changes in between but a massively complex die/chiplet approach fits the bill. Low end increase of 6x CUs in 2 gens (16 CU 6500XT -> 96 CU 8500) having it match a last gen flagship and a >2.5x flagship increase in one gen sounds borderline fanfiction.
No it just costs a lot of money and watts.
For proxy, MI300X is 304CU.
And low-end will not change any CU counts, this was an ultra-halo part.
>=1TB/s bridges connecting the AIDs (>=2TB/s bi-directional)?
Ehhh 3 times that.
restrict each part to the lowest common denominator so they all run the same speed?
That's how MI300 works so yea.
Would hardware monitors would report all frequencies like CPUs?
N3x already clocks shader engines individually so yea.
 
Knowing that and back to AMD, it seems very reasonable that the software was the reason to cancel this top SKU multi GCD design. Making development plan and proof of concept are easy but supporting corner cases of API subtilities and software oddities take a huge amount of validation effort. And it's well know that AMD don't have the resources to innovate faster than Nvidia in software....I'm not surprised that AMD managers are not feeling confident in getting software at the same pace than the hardware...
maybe they should buy a couple of Nvidia AI hardware to do hard work for them ;)
 
When referring to multi compute and graphics dies working as a single instance, are we talking professional or consumer? I don't believe for a second AMD has or will have it working efficiently for games in the next few years.

The “cancelled” high end RDNA 4 gaming chips were supposedly using multiple compute dies. Given how radical of an idea it is you would expect the design to go through the ringer in the lab before making it on to a product roadmap. If the rumors are true it would be interesting to know if it’s a design or manufacturing issue. The latter is more understandable.
 
Those engaging with juvenile language will be ejected from the thread if they persist. If a point isn't worth your effort to make in a respectful, intelligent fashion, it isn't worth making at all.
 
Status
Not open for further replies.
Back
Top