Bondrewd
Veteran
No?Supposedly there's 24 "compute units" in the high end APU.
Phoenix is 12CU aka 6WGP RDNA3 config.
Board costs and baseline mem capacity goes way up.Couldn't we just go with more channels?
Possible but not mainstream.
No?Supposedly there's 24 "compute units" in the high end APU.
Board costs and baseline mem capacity goes way up.Couldn't we just go with more channels?
Couldn't we just go with more channels? Intel used to have a tri channel ram set up. What is preventing them from doing so again or a quad channel solution ?
No?
Phoenix is 12CU aka 6WGP RDNA3 config.
Board costs and baseline mem capacity goes way up.
Possible but not mainstream.
Cost, especially as DDR5 has increased density - DDR5 8gb DIMMs are actually pretty rare compared to 16GB. So you would need ~4 slots of memory (64GB?) to maybe get less bandwidth than a GTX 3050.
So it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.However, as I wrote here, there are a lot of differences and challenges with the open PC market as compared to the closed market of consoles/Apple wrt marketing a UMA. The big problem still remains - memory bandwidth. Infinity cache may provide a glimpse into how this problem can be mitigated for APU's, but there's a big difference for it providing that extra boost for a discrete card that already has 500GB+ to main memory vs 80GB/sec - and that's with fast DDR5. Apple 'solves' it by just creating ridiculously large chips that only they can afford to make, and consoles solve it by using GDDR6 which isn't affordable for the small amounts PC OEM's could order vs say, Sony locking down supply agreements for years due to knowing they will order 10+M chips a year.
Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work, moving performance targets as time advances like the 6600XT being 1080Ti perf with N33 probably being >=1.4x faster than that etc. Like it's been mentioned If you have a big 32 CU APU you still need to feed it, the 6600XT has 32MB infinity cache + 256GB/s memory bandwidth which is quad channel 8000MT/s or 16000MT/s dual channel. 6900HX is 10.2b transistors, 208mm² with 12 CUs, has 16MB L3 exclusive to the CPU, 2MB L2 for the GPU etc. At least doubling caches making the L3 shared to help with bandwidth, more memory PHYs for bandwidth, massive GPU transistor requirements from 12 CUs to 32 CUs for example, not forgetting future CPU transistor increases like Zen 3 4.15b transistors -> 4 being 6.57b per CCD or 1.58x (assume similar increases here) etc etc. Or you forgo DIMMs and have a very wide memory pool on the board to feed it like consoles/apple and with GDDR7 doubling bandwidth, you can have 512-576GB/s on a 128 bit bus (32gbit/s, 36gbit/s) which is amazing but we aren't there yet and that'd be expensive anywaySo it's still that memory problem. I knew DDR5 is somewhat meh with bw, but thought DDR6 should be almost good then. Obviously i've interpreted the numbers quite wrong, and all it gives will be PS4 bandwidth.
Thanks for disappointing me. Now i know why i feel alone with my naive APU optimism.
Ok then. So i'll have no other choice than standing proud on Titanic too, hoping the iceberg in front won't be that bad... : /
Good bet.and they claim 1.7-1.8x efficiency for N31/5nm parts in raster
Which is why you tile them.Big APUs have all those compounding problems that make die size, power consumption and costs spiral to make them work
Split on-die L2 does not make it a chiplet design, it's still one die.A100 has a chiplet design
Non RT game performance between 4080 and 4090. RT heavy game performance 3090Ti level. 350 watt. So closer to 50% for raster games, closer to 100% for RT.So about a day to go, where are you all putting your money for claimed perf/efficiency? I'll go for them sandbagging a bit with that 1.5x perf/w and they claim 1.7-1.8x efficiency for N31/5nm parts in raster
Leaked? nooooo people would be lobotomized if they do leak.There are leaked slides of the RDNA3 presentation?
And you won't, AMD opsec is great.I can't find them.
Mate, it is a patent on cache replacement policy…Maybe this is the next gen infinity cache
That makes no sense. Plus emulation of fixed function units is nothing new. How do you think games from DX7 era run on current GPUs? T&L fixed functions must have been emulated for quite a while, as GPUs are mostly compute now. They might be using normal cores for AI inferencing though.Amd's secret weapon.. lisa hinted it with this quote.. vitual tensor cores
Almost surely yes (less logistics involved), reference models will probably use the very same PCB with different number of RAM chips, maybe a different number of VRM, and maybe a less beefed-up cooling.Does lower bandwidth variant use the same 6 MCD package?
I'm not sure.Does lower bandwidth variant use the same 6 MCD package?
There's nothing to 'emulate' in tensor/xmx/matrix cores, they're just accelerating normal matrix math. As for precision, RDNA2 goes all the way down to 8:1 INT4 so yeah, you can pick lower precisions for inferencing if you want.That makes no sense. Plus emulation of fixed function units is nothing new. How do you think games from DX7 era run on current GPUs? T&L fixed functions must have been emulated for quite a while, as GPUs are mostly compute now. They might be using normal cores for AI inferencing though.