Hardware details about Adreno 740

dr_ribit

Newcomer
Does anyone have some information about the Adreno 740? I couldn't find any details on Qualcomm website, and the only source (Wikipedia) that I've seen mentioning some lists quite ridiculous specs (like 2560 ALUs and 3.5TFLOPS FP32) which don't make much sense to me.
 
Cute. I hope the time spent writing this very useful post at least made you feel like you are contributing.
The entire GPU architecture section was recently locked down because of umm behavioral difficulties.

Qualcomm seems to keep pretty quiet about the details of their GPU tech these days. And the hardware site scene isn't what it used to be.
 
I found this site useful to look up mobile specs: https://chipguider.com/?gpu=adreno-740-980mhz
No idea how accurate their data is, but at least it's hard numbers.

Thanks! Yeah, I’ve seen those figures too and they are just as mysterious as the math simply doesn’t check out (2048 SIMD ALUs times 980mhz is not any multiply of 2138). Assuming these are FP16 ALUs or FP32 is rate-throttled would at least put the numbers in the right ballpark but it’s still weird.

The main reason why I’m asking is that all these specs suggest a humongous GPU, and it seems to do fairly well in synthetic gaming tests, but compute benchmarks are comparable to a A14 with its 512 ALUs running at around 1Ghz. So either the drivers are bust for compute or Qualcomms architecture is inherently inefficient - at least this makes me wonder about their claims of using scalar SIMD.
 
Thanks! Yeah, I’ve seen those figures too and they are just as mysterious as the math simply doesn’t check out (2048 SIMD ALUs times 980mhz is not any multiply of 2138). Assuming these are FP16 ALUs or FP32 is rate-throttled would at least put the numbers in the right ballpark but it’s still weird.

The main reason why I’m asking is that all these specs suggest a humongous GPU, and it seems to do fairly well in synthetic gaming tests, but compute benchmarks are comparable to a A14 with its 512 ALUs running at around 1Ghz. So either the drivers are bust for compute or Qualcomms architecture is inherently inefficient - at least this makes me wonder about their claims of using scalar SIMD.
In a strict sense any such GPU would have more like 2048 SIMD lanes. There's another scenario floating around the net, where the Adreno730 has supposedly 768 SPs and the Adreno740 a total of 1024 SPs at a boost clock of around 720MHz for the latter; hypothetically if each SP is capable of 5 FLOPs FP32 you actually get 3.7 TFLOPs FP32 on paper. Now the next best question would be if that hypothetical 5th FLOP is of any use for normal conditions or just in some weird corner cases. Going down to 4 FLOPs@0.72GHz sets it immediately back to 2.9 TFLOPs, where now it'll come down to which percentage ALUs can be utilized.

I recall in the past QCOM had hired a 3rd company to write its GPU compilers, but no idea if that's still the case. I would think that they are by now developing their own compilers. https://codeplay.com/company/collaborations/

Either way if the 740 ends up roughly with comparable to Apple M1 GPU performance in a shader limited synthetic benchmark, both should have comparable real time FP32 throughput.
 
Last edited:
In a strict sense any such GPU would have more like 2048 SIMD lanes. There's another scenario floating around the net, where the Adreno730 has supposedly 768 SPs and the Adreno740 a total of 1024 SPs at a boost clock of around 720MHz for the latter; hypothetically if each SP is capable of 5 FLOPs FP32 you actually get 3.7 TFLOPs FP32 on paper. Now the next best question would be if that hypothetical 5th FLOP is of any use for normal conditions or just in some weird corner cases. Going down to 4 FLOPs@0.72GHz sets it immediately back to 2.9 TFLOPs, where now it'll come down to which percentage ALUs can be utilized.

I recall in the past QCOM had hired a 3rd company to write its GPU compilers, but no idea if that's still the case. I would think that they are by now developing their own compilers. https://codeplay.com/company/collaborations/

Either way if the 740 ends up roughly with comparable to Apple M1 GPU performance in a shader limited synthetic benchmark, both should have comparable real time FP32 throughput.

Thanks! Do you happen to know more about the architecture of Adreno SPs? I’ve found various claims around (there is a paper that depicts Adreno 630 using 3—wide SIMD like AMD or Apple).

As I said, what surprises me that compute benchmarks of Adreno740 are not very good: https://browser.geekbench.com/v6/compute/search?utf8=✓&q=SM-S916N

This is comparable to A14 across various benchmarks and that GPU has 512 SIMD lanes with half-rate FP32. It’s as you say, I’d expect 2048 SIMD lanes to be at Lesart competitive with the 1024 lanes of M1 (with the adjustment that M1 runs a higher clock). But this is definitely not the case.
 
Thanks! Do you happen to know more about the architecture of Adreno SPs? I’ve found various claims around (there is a paper that depicts Adreno 630 using 3—wide SIMD like AMD or Apple).

As I said, what surprises me that compute benchmarks of Adreno740 are not very good: https://browser.geekbench.com/v6/compute/search?utf8=✓&q=SM-S916N

This is comparable to A14 across various benchmarks and that GPU has 512 SIMD lanes with half-rate FP32. It’s as you say, I’d expect 2048 SIMD lanes to be at Lesart competitive with the 1024 lanes of M1 (with the adjustment that M1 runs a higher clock). But this is definitely not the case.
If Apple hasn't changed its ALU layout lately each lane should still be capable of 2 MADDs/lane. I was able to follow up somewhat up to Adreno650 which has 512SPs also with a theoretical throughput of 4 FLOPs/lane at 587MHz. I usually use Kishonti's Gfxbench for my guesswork and in a quite stressful 4k like Aztec ruins it looks roughly like that: https://gfxbench.com/result.jsp?benchmark=gfx50&test=778&order=median&base=gpu&ff-check-desktop=0

These are average scores and while you can set at their website for best scores I don't think they're as representative as the medium scores. Also in some cases some SoCs get tested and never updated from a point and on, which in some cases could mean outdated drivers etc. Here you can see an Adreno730 being roughly in the A14 performance ballpark and I wouldn't be in the least surprised if their peak FP32 throughputs (marketing rubbish aside) are roughly the same between those two.

Without any Adreno740 results yet in that specific benchmark suite I'd rather not rely on geekbench or any other benchmark suite which doesn't specialize in measuring GPU performance. I'd like to stand corrected but I'd say that the 740 has as many as 2048 lanes, as the A14 GPU is clocked at 3.1GHz (which you can also find claimed at many spots across the internet). I'd rather suggest half the lanes for the first and rather 1/3rd the frequency as suggested by questionable sources on the internet.

My gut feeling tells me that for the 730/740 GPUs QCOM has gone gradually wider on throughput per SIMD compared to the 6th generation and the 740 should have at least twice the lanes compared to the Adreno650 with a somewhat higher peak frequency for the 740. I would expect the 740 to end up somewhere over 20fps and the closer to 30fps, the closer it would get to M1 GPU (~1.25GHz, 2.6TFLOPs FP32) performance in that test. Assume we'll see up to 3x times higher GPU performance for the 740 compared to the 650, it would be roughly in line with the FP32 TFLOPs each of them has.

Now comparing anything Android to anything Apple/iOS has it's own pitfalls especially for GPU drivers, as in the second case Apple runs in its own very safe software platform. Any Android developer is lucky if the GPU driver won't backfire in his face.
 
Last edited:
Based on this video Link at 2:53, adreno 740 1536 ALU @680 MHz. @Nebuchadnezzar please correct it

So roughly 2TFLOPs, assuming the usual scalar ALUs. This definitely sounds more realistic than other claims and combined with the faster RAM also explains why it's does better than A15/A16 in some synthetic GPU benchmarks. Still wondering about the poor Geekbench compute scores. Maybe indeed driver issues like Ailuros suggests. Probably not optimised for GPGPU.

Anyway, that's a humongous GPU for a mobile phone! Definitely dwarfs Apple's 640-ALU (5*4*32) designs.
 
Cute. I hope the time spent writing this very useful post at least made you feel like you are contributing.

Ah the old sarcasm based reply, you asked for information about the Adreno 740 and my Google search link provided that for you in abundance.

So you got what you asked for, maybe next time construct a thread with a bit more detail of exactly what you're looking for, where you've already looked and what you're hoping to gain so people know how best to help you.
 
Ah the old sarcasm based reply, you asked for information about the Adreno 740 and my Google search link provided that for you in abundance.

So you got what you asked for, maybe next time construct a thread with a bit more detail of exactly what you're looking for, where you've already looked and what you're hoping to gain so people know how best to help you.

Specifications, architecture details, execution model, ISA information, that kind of stuff. I was under impression that this was a GPU enthusiast community? Didn't think that I had to specify this in more detail.

If you'd follow your own google recommendation you'd quickly see that the only information available around are some superfluous benchmarks and what seems like arbitrary statements about the GPU capabilities. The link provided by @mfaisalkemal at least makes sense to me. What makes less sense is the very poor performance we see in compute benchmarks like Geekbench.
 
So roughly 2TFLOPs, assuming the usual scalar ALUs. This definitely sounds more realistic than other claims and combined with the faster RAM also explains why it's does better than A15/A16 in some synthetic GPU benchmarks. Still wondering about the poor Geekbench compute scores. Maybe indeed driver issues like Ailuros suggests. Probably not optimised for GPGPU.

Anyway, that's a humongous GPU for a mobile phone! Definitely dwarfs Apple's 640-ALU (5*4*32) designs.

If QCOM would have its own OS/sw platform, its GPU drivers would show much higher stability, predictability and what not. IMHO it's a common problem for anything Android and not exclusive to Adreno/QCOM etc.

Kishonti has in its benchmark suite also a low level driver overhead test for which results for comparable ULP SoCs for anything tablet/smartphone looks like this: https://gfxbench.com/result.jsp?benchmark=gfx50&test=639&order=median&base=gpu&ff-check-desktop=0

edit: since the slides in the former video link look genuine, I'm noting myself Adreno730/1024SPs@900MHz boost, Adreno740/1536SPs@680MHz boost.
 
Last edited:
If QCOM would have its own OS/sw platform, its GPU drivers would show much higher stability, predictability and what not. IMHO it's a common problem for anything Android and not exclusive to Adreno/QCOM etc.

Qualcomm has usually delivered their own fork of Linux and (I assume here) Mesa. What else should they have?

IMO, if Qualcomm valued stability and predictability they should do what Intel and AMD do for their platforms, ie work with upstream Linux (and Mesa) to get stuff included their as early as possible.
 
Based on this video Link at 2:53, adreno 740 1536 ALU @680 MHz. @Nebuchadnezzar please correct it
I can't see a video for some reason, but obviously I can't give any input here in my position either way.

You could estimate the ALU growth by die shot comparison, and you also see the exposed "core count" in OpenCL info such as OpenCL-Z along with the GFlops measured as well as frequency from the device kernel. Based on what a commercial S23 is doing, those numbers could work out.
 
Qualcomm has usually delivered their own fork of Linux and (I assume here) Mesa. What else should they have?

IMO, if Qualcomm valued stability and predictability they should do what Intel and AMD do for their platforms, ie work with upstream Linux (and Mesa) to get stuff included their as early as possible.
Obviously none of the above is a viable replacement for any smartphone and Android, which is their actual core market for their SoCs right now.
 
Obviously none of the above is a viable replacement for any smartphone and Android, which is their actual core market for their SoCs right now.

What exactly do you mean. What have I claimed is a "replacement for any smartphone and Android"?
 
Back
Top