AMD and Samsung Announce Strategic Partnership in Mobile IP

Lurkmass · Jun 7, 2020

@mfaisalkemal

In their article, one of their slides shows them using a 'filtering' pass which seems to suggest that they are using 1 sample per-pixel or less and the image is then denoised afterwards which could explain the wins against cascaded shadow maps that Imagination are seeing. Ray traced shadows starts losing if you use multiple samples per-pixel.

The biggest advantage to cascaded shadow maps is that they're temporally stable compared to ray traced shadows.

JoeJ · Jun 7, 2020

Lurkmass said:
The biggest advantage to cascaded shadow maps is that they're temporally stable compared to ray traced shadows.

So, temporally stable... artifacts. Not that much of an advantage

Edit: They do not mention if they render all 4 cascades each frame. The cascades have 2K resolution.
For a proper comparison we would need to factor in how many cascades an average game updates per frame. I would guess two of them? Not sure about distance so van only guess.
Raster would win then probably.

But if there are more lights, RT can select just one of them per pixel and denoise... Good enough? RT wins?

Seems difficult to compare, but interesting to see it's not a huge performance difference.

Deleted member 13524 · Jun 8, 2020

Ailuros said:
Fake slide aside, I can't figure out what theoretically speaks against implementing RT on a hypothetical 358 GFLOP GPU which is meant for a low power mobile SoC.

Not enough compute power for real-time raytracing at any credible resolution.

Besides, the Adreno 650 in Qualcomm's current 865 flagship is a >1TFLOPs GPU, and honestly 4 CUs is just too little for a 5nm chip.
Each RDNA WGP is supposed to be less than 5 mm^2 on 7nm. Two WGPs (4 CUs) would be less than 10mm^2 on 7nm. At 5nm that would be ~7 mm^2.

Samsung would only dedicate 7mm^2 to the GPU compute units on its 80-120mm^2 SoC? I find that hard to believe.

Entropy · Jun 8, 2020

mfaisalkemal said:
I think dedicated ray tracing on mobile space is very suitable.

Cell phones are one-device-does-all digital assistants, sold in a billion+ volumes yearly. Since they perform all digital duties for their owners, they strive to contain a viable compromise of features. This is in contrast to, for instance, assembling a workstation as a professional tool, where tailoring to very specific tasks is possible. There is a lot of competition regarding what features goes into a cell phone and what doesn't, because all users care about cost and battery life (and security though not everyone is aware). (Example: Nvidias recently unveiled "Orin" contains 4 10GbE links, (making a mockery of the solutions offered to consumers) - but even if such functionality is clearly possible, it doesn't make the feature set cut for phone SoCs. For obvious reasons - low utilisation.)

Dedicating hardware resources to RTRT on cellphones is an example of tech in a vacuum reasoning. How can dedicating hardware to minor minutia of 3D-shadow rendering on jack of all trades devices with 5" screens be a good idea?
Playing games on cell phones comes a fair bit down in a long list of communication, shopping, browsing, banking, photographing, filming, GPSing and so on that phones do. I have yet to see anyone play a 3D-game on cell phones with ANY kind of dynamic shadows. I know such games exist, I'm just stating I never saw anyone play such a game on a cell phone, ever, myself included. Blob shadows would be a step forward, and honestly a sufficient one given the limitations of the display.
Take a long hard look at the games that people actually play on their cell phones. To what extent would their enjoyment of Solitaire, Candy Crush look alikes and Mario Run be enhanced by RTRT? What real life problem does dedicated RTRT hardware adress for cell phone users?

It's a waste of gates. And waste is bad engineering.

stiftl · Jun 8, 2020

I think a 8 CU setup would make the most sense size wise and they can re-use this setup for the Renoir successor.

Nebuchadnezzar · Jun 8, 2020

Entropy said:
Cell phones are one-device-does-all digital assistants, sold in a billion+ volumes yearly. Since they perform all digital duties for their owners, they strive to contain a viable compromise of features. This is in contrast to, for instance, assembling a workstation as a professional tool, where tailoring to very specific tasks is possible. There is a lot of competition regarding what features goes into a cell phone and what doesn't, because all users care about cost and battery life (and security though not everyone is aware). (Example: Nvidias recently unveiled "Orin" contains 4 10GbE links, (making a mockery of the solutions offered to consumers) - but even if such functionality is clearly possible, it doesn't make the feature set cut for phone SoCs. For obvious reasons - low utilisation.)

Dedicating hardware resources to RTRT on cellphones is an example of tech in a vacuum reasoning. How can dedicating hardware to minor minutia of 3D-shadow rendering on jack of all trades devices with 5" screens be a good idea?
Playing games on cell phones comes a fair bit down in a long list of communication, shopping, browsing, banking, photographing, filming, GPSing and so on that phones do. I have yet to see anyone play a 3D-game on cell phones with ANY kind of dynamic shadows. I know such games exist, I'm just stating I never saw anyone play such a game on a cell phone, ever, myself included. Blob shadows would be a step forward, and honestly a sufficient one given the limitations of the display.
Take a long hard look at the games that people actually play on their cell phones. To what extent would their enjoyment of Solitaire, Candy Crush look alikes and Mario Run be enhanced by RTRT? What real life problem does dedicated RTRT hardware adress for cell phone users?

It's a waste of gates. And waste is bad engineering.

I think you're far off the mark here. RTRT GPU on a mobile device is a huge marketing point. A similar comparison point would be AI accelerators that take up quite a bit of silicon on various SoCs out there. They're not really needed as things could have just as well been accelerated by GPUs, yet the vendors integrated them because they saw a path forward. In the same way I fully see Samsung willing to make a bet on the AMD GPU with RTRT - probably a key point whey they even licensed it.

In any case, the rumour is bullshit and that AMD SoC doesn't exist.

Kaotik · Jun 8, 2020

stiftl said:
I think a 8 CU setup would make the most sense size wise and they can re-use this setup for the Renoir successor.

Their first RDNA2 APU won't probably be mere 8 CUs, Intel is pulling ahead.

Entropy · Jun 8, 2020

Nebuchadnezzar said:
I think you're far off the mark here. RTRT GPU on a mobile device is a huge marketing point. A similar comparison point would be AI accelerators that take up quite a bit of silicon on various SoCs out there. They're not really needed as things could have just as well been accelerated by GPUs, yet the vendors integrated them because they saw a path forward. In the same way I fully see Samsung willing to make a bet on the AMD GPU with RTRT - probably a key point whey they even licensed it.

In any case, the rumour is bullshit and that AMD SoC doesn't exist.

RTRT may or may actually not be a viable buzzword in PC tech enthusiast circles. In the cell phone segment, only a tiny (TINY!) minority of consumers even know what it means, and it doesn’t necessarily hold any relevance at all to those that do. Even though the phone market is getting mature, there are actually still quite a number of worthwhile improvements that a large percentage users can appreciate. Again, spending chip area dedicted to a particular (and unnecessary in the target market) way of computing some parts of lighting in a very, very small subset of games is a terrible way to budget chip area.
Most action in phone space today seems focussed on camera tech, which makes sense since currently ten billion images are uploaded per day. Which implies at least some level of interest on the consumer end. Whereas even among the relatively fewer (but still huge) number of people who spend money on games, they actually don’t do it on the graphically ambitious titles.

We all know this. It is just that saying it out loud in a place like this is socially awkward.

Nebuchadnezzar · Jun 9, 2020

Entropy said:
RTRT may or may actually not be a viable buzzword in PC tech enthusiast circles. In the cell phone segment, only a tiny (TINY!) minority of consumers even know what it means, and it doesn’t necessarily hold any relevance at all to those that do.

A SoC vendor doesn't sell chips to consumers, it sells chips to OEM vendors. I think it's a pretty great distinguishing feature for Samsung to gain an edge over Qualcomm in that battle, and it's exactly the kind of thing that would work. Being able to claim some sort of feature parity with next-gen consoles is an insane marketing point.

Again, spending chip area dedicted to a particular (and unnecessary in the target market) way of computing some parts of lighting in a very, very small subset of games is a terrible way to budget chip area.

Samsung's been wasting 30% die area for crap custom CPUs and huge noncompetitive Mali GPUs for years and years. You underestimate their willingness to spend die area on something.

Ailuros · Jun 10, 2020

ToTTenTranz said:
Not enough compute power for real-time raytracing at any credible resolution.

Besides, the Adreno 650 in Qualcomm's current 865 flagship is a >1TFLOPs GPU, and honestly 4 CUs is just too little for a 5nm chip.
Each RDNA WGP is supposed to be less than 5 mm^2 on 7nm. Two WGPs (4 CUs) would be less than 10mm^2 on 7nm. At 5nm that would be ~7 mm^2.

Samsung would only dedicate 7mm^2 to the GPU compute units on its 80-120mm^2 SoC? I find that hard to believe.

The slide is a fake for all other reasons long before that. Other than that the discontinued GR6500 was roughly at 150GFLOPs at its projected frequency: https://www.anandtech.com/show/7870...vr-wizard-gpu-family-rogue-learns-ray-tracing

Ailuros · Jun 10, 2020

Entropy said:
RTRT may or may actually not be a viable buzzword in PC tech enthusiast circles. In the cell phone segment, only a tiny (TINY!) minority of consumers even know what it means, and it doesn’t necessarily hold any relevance at all to those that do. Even though the phone market is getting mature, there are actually still quite a number of worthwhile improvements that a large percentage users can appreciate. Again, spending chip area dedicted to a particular (and unnecessary in the target market) way of computing some parts of lighting in a very, very small subset of games is a terrible way to budget chip area.
Most action in phone space today seems focussed on camera tech, which makes sense since currently ten billion images are uploaded per day. Which implies at least some level of interest on the consumer end. Whereas even among the relatively fewer (but still huge) number of people who spend money on games, they actually don’t do it on the graphically ambitious titles.

I can't push myself to find Rys' comment here in this forum that the added area for a dedicated RT unit in the above mentioned GR6500 wasn't at least large (from what I recall). I cannot imagine any IHV nowadays that would NOT integrate RT into the existing pipelines but rather go for dedicated RT units within a GPU block. Else I wouldn't suggest that the added space makes that much difference. I could argue within your very same reasoning that ULP mobile GPUs wouldn't need more than one rasterizing unit (which made up around 5% of the area estate when desktop GPUs still had only one raster?) despite that many of them support at least on paper tessellation. I'm not aware of any ULP mobile GPU right now that is capable of processing more than 1 triangle/clock, which makes for other reasons too having tessellation on such a GPU just an as moot point as well as some mandatory DX11 functionalities, which are a sizeable persentage in added area for ALUs alone compared to DX10.x actually supported in today's Apple ULP SoC GPUs. On the other side if we'd take the >Adreno5xx example any of those DX11x ULP GPUs are comparably tiny these days compared to their direct competitors.

No idea if it's true but from what I've heard Apple is very keen for adding RT in the longrun, meaning that even besides pure marketing ray tracing can hardly be something for only a rare minority report. Still we havent' even a clue how much added die area the full integration of RT into a GPU pipeline would cost, but so far I haven't heard of any hints that would suggest large persentages.

mfaisalkemal · Jun 10, 2020

Ailuros said:
I can't push myself to find Rys' comment here in this forum that the added area for a dedicated RT unit in the above mentioned GR6500 wasn't at least large (from what I recall). I cannot imagine any IHV nowadays that would NOT integrate RT into the existing pipelines but rather go for dedicated RT units within a GPU block. Else I wouldn't suggest that the added space makes that much difference. I could argue within your very same reasoning that ULP mobile GPUs wouldn't need more than one rasterizing unit (which made up around 5% of the area estate when desktop GPUs still had only one raster?) despite that many of them support at least on paper tessellation. I'm not aware of any ULP mobile GPU right now that is capable of processing more than 1 triangle/clock, which makes for other reasons too having tessellation on such a GPU just an as moot point as well as some mandatory DX11 functionalities, which are a sizeable persentage in added area for ALUs alone compared to DX10.x actually supported in today's Apple ULP SoC GPUs. On the other side if we'd take the >Adreno5xx example any of those DX11x ULP GPUs are comparably tiny these days compared to their direct competitors.

No idea if it's true but from what I've heard Apple is very keen for adding RT in the longrun, meaning that even besides pure marketing ray tracing can hardly be something for only a rare minority report. Still we havent' even a clue how much added die area the full integration of RT into a GPU pipeline would cost, but so far I haven't heard of any hints that would suggest large persentages.

Maybe this post?
@Rys said GPU IP portion is substantially smaller than 100mm2 powervr wizard soc @ 28nm.

GPU IP is series 6xt 4 cluster and iPhone 6 using modified version of it with size 19.1mm2 @ 20nm from anandtech. So @ 28nm size doubled around 38.2mm2.

Let's assume original version around 35mm2 so wizard IP per core around 65mm2 @28nm.

With 5nm this year and after, wizard IP core size will be around 5.31mm2(from 28nm to 16nm divide 2x, from 16nm to 7nm divide 3.33x, and from 7nm to 5nm divide 1.84x) and I think that size suitable for smartphone soc.

I hope apple will use Wizard IP this year on A14 with 3 core @ 1333mhz and
marketing slogan will be 2 billion ray on your hands.

Entropy · Jun 10, 2020

Nebuchadnezzar said:
A SoC vendor doesn't sell chips to consumers, it sells chips to OEM vendors. I think it's a pretty great distinguishing feature for Samsung to gain an edge over Qualcomm in that battle, and it's exactly the kind of thing that would work. Being able to claim some sort of feature parity with next-gen consoles is an insane marketing point.

We can agree on the "insane" part. How may sigmas do you have to move from the average S20 buyer until you find the ones who fulfills the requirements of 1: knowing what it is and 2: cares, plus 3: is naive enough to believe that such a connection makes a difference to the Android games she plays? Mobile is inevitably moving from mature towards stale, but investing those gates into buffers, caches, prefetchers, whathaveyou would at least ensure that they got used and provided some kind of benefit. (The stuff you do an excellent, and much appreciated job of researching, I might add.) Or put the money towards some other aspect of the phone that people interact with, preferably something that makes a tangible difference to a prospective buyer when she picks it up in hand and compares it to 10-15 other handsets in a shop.
Would a sales person in a shop pitch the Samsung to that customer with "it has dedicated RTRT hardware"? It would never happen.

Samsung's been wasting 30% die area for crap custom CPUs and huge noncompetitive Mali GPUs for years and years. You underestimate their willingness to spend die area on something.

That’s completely different. They haven’t done that to attract customers, they have done it for (pretty good) corporate strategy reasons. It gave them potential independence (which they might have used) and safety, they gained a bargaining chip vs. Qualcomm, they could fill up their foundry capacity when orders faltered and so on. In my country, which is "Exynos" territory, they have never marketed that as a feature towards consumers.

Nebuchadnezzar · Jun 11, 2020

Entropy said:
We can agree on the "insane" part. How may sigmas do you have to move from the average S20 buyer until you find the ones who fulfills the requirements of 1: knowing what it is and 2: cares, plus 3: is naive enough to believe that such a connection makes a difference to the Android games she plays? Mobile is inevitably moving from mature towards stale, but investing those gates into buffers, caches, prefetchers, whathaveyou would at least ensure that they got used and provided some kind of benefit. (The stuff you do an excellent, and much appreciated job of researching, I might add.) Or put the money towards some other aspect of the phone that people interact with, preferably something that makes a tangible difference to a prospective buyer when she picks it up in hand and compares it to 10-15 other handsets in a shop.
Would a sales person in a shop pitch the Samsung to that customer with "it has dedicated RTRT hardware"? It would never happen.

That’s completely different. They haven’t done that to attract customers, they have done it for (pretty good) corporate strategy reasons. It gave them potential independence (which they might have used) and safety, they gained a bargaining chip vs. Qualcomm, they could fill up their foundry capacity when orders faltered and so on. In my country, which is "Exynos" territory, they have never marketed that as a feature towards consumers.

All those points are completely irrelevant because again they don't sell to the customer. They sell to the vendor. Samsung had heavily marketed their custom cores as differentiating advantages before things went inarguably south as to their performance and efficiency. The same thing about NPUs in current SoCs that have been around for 2 years now, by your definition that's completely useless silicon and investment as hardly anything uses it and it could very well just have been done on the GPU. I see it as an exact parallel to RTRT GPUs.

And no Samsung Mobile wouldn't never advertise that as long as they're dual sourcing, but you sure as hell will have SLSI advertising it to hell. Apple in a similar situation will also sure as hell advertise their new iPhone "With advanced ray-tracing technology GPU" when it comes out.

Ailuros · Jun 19, 2020

mfaisalkemal said:
Maybe this post?
I hope apple will use Wizard IP this year on A14 with 3 core @ 1333mhz and
marketing slogan will be 2 billion ray on your hands.

I don't know if Apple really wants RT IP. If they do using something like dedicated RT units sounds weird to me at this stage. IF they should opt for it, it sounds way more reasonable to get an archictectural license for Alborix B (or whenever IMG decides to integrate RT into the pipeline) and write their own GPU code based on that for their own needs.

Deleted member 90741 · Jun 19, 2020

Radeon ProRender libraries/plugins for Metal has Navi2x RT acceleration. You can find this info on GPUOpen and video by Harada on AMD YT channel.
So Apple would have definitely have a use for it.

Ailuros · Jun 24, 2020

ethernity said:
Radeon ProRender libraries/plugins for Metal has Navi2x RT acceleration. You can find this info on GPUOpen and video by Harada on AMD YT channel.
So Apple would have definitely have a use for it.

Interesting. Do you have by chance any clue what they are using it for?

Arnold Beckenbauer · Jan 12, 2021

Next flagship Exynos with AMD GPU?

Deleted member 13524 · Jan 12, 2021

Arnold Beckenbauer said:
Next flagship Exynos with AMD GPU?

Mali G76 announced at 5:13.

Not yet

Arnold Beckenbauer · Jan 12, 2021

Samsung Confirms AMD RDNA GPU In Next Exynos Flagship (anandtech.com)

AMD and Samsung Announce Strategic Partnership in Mobile IP

Lurkmass

JoeJ

Deleted member 13524

Guest

Entropy

stiftl

Nebuchadnezzar

Kaotik

Drunk Member

Entropy

Nebuchadnezzar

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

mfaisalkemal

Entropy

Nebuchadnezzar

Ailuros

Epsilon plus three

Deleted member 90741

Guest

Ailuros

Epsilon plus three

Arnold Beckenbauer

Deleted member 13524

Guest

Arnold Beckenbauer

Similar threads