AMD and Samsung Announce Strategic Partnership in Mobile IP

Looks like they fixed most of the issues they had with Exynos 2200 / Xclipse 920 (RDNA 2) on performance front with 2400 / Xclipse 940 (RDNA 3)

In initial tests CPU side is slightly behind Snapdragon 8 Gen 3 especially in nT content, but GPU is a match or even surpassing the latest Adreno.
In Geekbench GPU Xclipse has slight lead, 3DMark Solar Bay (with RT) shows Xclipse in slight lead with better result stability in long test runs and clearly better worst result. Older 3DMark Wild Life Extreme has Adreno in sight lead but Xclipses have better stability and near match worst score.
In actual games Genshin Impact, PUBG Mobile, Mobile Legends and Fortnite their performance is pretty much equal. In CoD: Mobile Adreno beats Xclipse silly, but Xclipse was capped at 60 FPS for unknown reason while Adreno was capped at 120 FPS. In RT supporting War Thunder Xclipse holds 100 FPS while Adreno throttles down to 40'ish FPS.

 
Looks like they fixed most of the issues they had with Exynos 2200 / Xclipse 920 (RDNA 2) on performance front with 2400 / Xclipse 940 (RDNA 3)

In initial tests CPU side is slightly behind Snapdragon 8 Gen 3 especially in nT content, but GPU is a match or even surpassing the latest Adreno.
In Geekbench GPU Xclipse has slight lead, 3DMark Solar Bay (with RT) shows Xclipse in slight lead with better result stability in long test runs and clearly better worst result. Older 3DMark Wild Life Extreme has Adreno in sight lead but Xclipses have better stability and near match worst score.
In actual games Genshin Impact, PUBG Mobile, Mobile Legends and Fortnite their performance is pretty much equal. In CoD: Mobile Adreno beats Xclipse silly, but Xclipse was capped at 60 FPS for unknown reason while Adreno was capped at 120 FPS. In RT supporting War Thunder Xclipse holds 100 FPS while Adreno throttles down to 40'ish FPS.

Part of the reason is also the process improvement. Seemingly the newest 4nm process by Samsung, 4LPP+ finally catches up to TSMC 4nm.

Next year's Exynos is going to be on their 2nd gen 3nm, SF3 (3GAP). Could this have RDNA4? The IP is ready and should be shipping in GPUs by Q4.
 
Part of the reason is also the process improvement. Seemingly the newest 4nm process by Samsung, 4LPP+ finally catches up to TSMC 4nm.

Next year's Exynos is going to be on their 2nd gen 3nm, SF3 (3GAP). Could this have RDNA4? The IP is ready and should be shipping in GPUs by Q4.


Samsung makes mention of the Exynos 2500 though specs unknown at this point, likely to be revealed by September-October. I think it's likely to have the same RDNA 3.5 GPU IP as Strix Point as AMD mentioned that they leveraged their collaboration with their mobile partners and optimized for power efficiency.
 

Seemingly a 16 CU RDNA 3.5 GPU. Crazy that it's the same configuration in Strix Point, though obviously at much lower clocks (And more memory bandwidth/system cache?).
What this guy states about the GPU being 16 CUs is very, very unlikely (most likely it's a typo and this would be 6 CUs). The chip would be huge, bandwidth starved to death and running far far below its optimal frequency.
Plus the numbers don't match when compared to the Xclipse 940. They says +30%... Xclipse 940 is supposed to have 6CUs. So you either go with better CUs and clock them a bit higher, or, maybe (just maybe) you go with 8 CUs.
But 16? And why focus on a GPU that large when the marketing craze is all about AI?

Anyway, if the +30% figure is correct, this is not bad at all. It remains to be seens if that's peak or sustained given it is still an unproven Samsung node.
 
What this guy states about the GPU being 16 CUs is very, very unlikely (most likely it's a typo and this would be 6 CUs). The chip would be huge, bandwidth starved to death and running far far below its optimal frequency.
Plus the numbers don't match when compared to the Xclipse 940. They says +30%... Xclipse 940 is supposed to have 6CUs. So you either go with better CUs and clock them a bit higher, or, maybe (just maybe) you go with 8 CUs.
But 16? And why focus on a GPU that large when the marketing craze is all about AI?

Anyway, if the +30% figure is correct, this is not bad at all. It remains to be seens if that's peak or sustained given it is still an unproven Samsung node.

The Xclipse 920 had 6 CUs, the Xclipse 940 most definitely had more. All the leaks/specs say 6 WGPs/12 CUs - https://forums.anandtech.com/threads/samsung-exynos-2400-soc.2617187/

GPU performance is also important. They have to market gaming performance in addition to AI performance. Given that it's SF3, the power consumption should be reasonable one would hope so could well be sustained performance.
 
The Xclipse 920 had 6 CUs, the Xclipse 940 most definitely had more. All the leaks/specs say 6 WGPs/12 CUs - https://forums.anandtech.com/threads/samsung-exynos-2400-soc.2617187/

GPU performance is also important. They have to market gaming performance in addition to AI performance. Given that it's SF3, the power consumption should be reasonable one would hope so could well be sustained performance.
Your answer made me double check this, and you're 100% correct. We now have definitive proof it's indeed 16CUs thanks to a die shot It still puzzles me a bit how big smartphone SoC are today (Snapdragon 8 Gen 3, Tensor G3 and Exynos 2400 are all 135mm²+... and Tensor G3 does not even have an integrated baseband...)

Now I need to investigate how exactly did AMD achieved this. I've seen some source stating that those CUs have their shader unit count halved. I guess the caches are smaller as well. Interesting stuff to check for this weekend.
Thx for your reply!
 
Now I need to investigate how exactly did AMD achieved this. I've seen some source stating that those CUs have their shader unit count halved. I guess the caches are smaller as well. Interesting stuff to check for this weekend.
Thx for your reply!
AMD HW designs are able to save a lot of die space due to a combination of factors like doing some API emulation, minimizing redundant hardware paths, and apply cross-stage/static pipeline state optimizations ...

If you have all sorts of programmable geometry pipeline shader stages like vertex/tessellation/geometry/mesh shaders in your API the *obvious* (or is it ?) thing to do is to implement separate HW shader stages for all of them. AMD just opts to implement those API shader stages on top of their surface/primitive HW shader stages. The fewer shader stages implemented, the less hardware needed ...

Reading the Mantle API guide, you'll notice that the API does not explicitly expose the concept of vertex buffers nor does it advertise support for a fixed function stream-out graphics pipeline stage. You're wondering how could this be when standard gfx APIs like Direct3D or Khronos APIs but not to worry AMD has those features covered. Modern AMD gfx architectures don't have fixed function vertex fetch hardware so they manually read vertex data with shaders accessing that information through a buffer and stream out is emulated with the help of global ordered append ...

MSAA ? No modern games use that feature anymore so supporting hardware bits like CMask (fast clears for non/compressed MSAA surfaces) or FMask (accelerated resolve for compressed MSAA surfaces) is redundant these days and they got removed just recently ...

Another general trend in hardware design is moving towards more monolithic hardware graphics pipelines so that rendering states gets statically baked/compiled into software (emulated/static states) as opposed to sending hardware commands to set up register states to enable/disable special hardware (dynamic states or other graphics shader states). Some modern hardware designs that have fewer HW shader stages are able to exploit the PSO API design even more by generating binaries for these combined/inlined shader stages ...
 
AMD HW designs are able to save a lot of die space due to a combination of factors like doing some API emulation, minimizing redundant hardware paths, and apply cross-stage/static pipeline state optimizations ...

If you have all sorts of programmable geometry pipeline shader stages like vertex/tessellation/geometry/mesh shaders in your API the *obvious* (or is it ?) thing to do is to implement separate HW shader stages for all of them. AMD just opts to implement those API shader stages on top of their surface/primitive HW shader stages. The fewer shader stages implemented, the less hardware needed ...

Reading the Mantle API guide, you'll notice that the API does not explicitly expose the concept of vertex buffers nor does it advertise support for a fixed function stream-out graphics pipeline stage. You're wondering how could this be when standard gfx APIs like Direct3D or Khronos APIs but not to worry AMD has those features covered. Modern AMD gfx architectures don't have fixed function vertex fetch hardware so they manually read vertex data with shaders accessing that information through a buffer and stream out is emulated with the help of global ordered append ...

MSAA ? No modern games use that feature anymore so supporting hardware bits like CMask (fast clears for non/compressed MSAA surfaces) or FMask (accelerated resolve for compressed MSAA surfaces) is redundant these days and they got removed just recently ...

Another general trend in hardware design is moving towards more monolithic hardware graphics pipelines so that rendering states gets statically baked/compiled into software (emulated/static states) as opposed to sending hardware commands to set up register states to enable/disable special hardware (dynamic states or other graphics shader states). Some modern hardware designs that have fewer HW shader stages are able to exploit the PSO API design even more by generating binaries for these combined/inlined shader stages ...
Very (very) insightful bits of infos, thx.
This means that modern architectures can be smaller by ditching a lot of legacy stuff. But this also applies to discrete GPU and larger APU parts right?
We have recent rumors stating that the Z2 Extreme will still be a 12CUs APU (mostly because those Z2 variants are binned chips) but still! In 2025 we'll have the "high end" portable gaming PC APU that will sport 12 RDNA3.5 CUs, and this Exynos smartphone chip will supposedly have 16 of those (cut-down variant for sure, and which much lower clocks, but still...)
 
Back
Top