The entire GPU architecture section was recently locked down because of umm behavioral difficulties.Cute. I hope the time spent writing this very useful post at least made you feel like you are contributing.
The entire GPU architecture section was recently locked down because of umm behavioral difficulties.
Qualcomm seems to keep pretty quiet about the details of their GPU tech these days. And the hardware site scene isn't what it used to be.
I found this site useful to look up mobile specs: https://chipguider.com/?gpu=adreno-740-980mhz
No idea how accurate their data is, but at least it's hard numbers.
In a strict sense any such GPU would have more like 2048 SIMD lanes. There's another scenario floating around the net, where the Adreno730 has supposedly 768 SPs and the Adreno740 a total of 1024 SPs at a boost clock of around 720MHz for the latter; hypothetically if each SP is capable of 5 FLOPs FP32 you actually get 3.7 TFLOPs FP32 on paper. Now the next best question would be if that hypothetical 5th FLOP is of any use for normal conditions or just in some weird corner cases. Going down to 4 FLOPs@0.72GHz sets it immediately back to 2.9 TFLOPs, where now it'll come down to which percentage ALUs can be utilized.Thanks! Yeah, I’ve seen those figures too and they are just as mysterious as the math simply doesn’t check out (2048 SIMD ALUs times 980mhz is not any multiply of 2138). Assuming these are FP16 ALUs or FP32 is rate-throttled would at least put the numbers in the right ballpark but it’s still weird.
The main reason why I’m asking is that all these specs suggest a humongous GPU, and it seems to do fairly well in synthetic gaming tests, but compute benchmarks are comparable to a A14 with its 512 ALUs running at around 1Ghz. So either the drivers are bust for compute or Qualcomms architecture is inherently inefficient - at least this makes me wonder about their claims of using scalar SIMD.
In a strict sense any such GPU would have more like 2048 SIMD lanes. There's another scenario floating around the net, where the Adreno730 has supposedly 768 SPs and the Adreno740 a total of 1024 SPs at a boost clock of around 720MHz for the latter; hypothetically if each SP is capable of 5 FLOPs FP32 you actually get 3.7 TFLOPs FP32 on paper. Now the next best question would be if that hypothetical 5th FLOP is of any use for normal conditions or just in some weird corner cases. Going down to 4 FLOPs@0.72GHz sets it immediately back to 2.9 TFLOPs, where now it'll come down to which percentage ALUs can be utilized.
I recall in the past QCOM had hired a 3rd company to write its GPU compilers, but no idea if that's still the case. I would think that they are by now developing their own compilers. https://codeplay.com/company/collaborations/
Either way if the 740 ends up roughly with comparable to Apple M1 GPU performance in a shader limited synthetic benchmark, both should have comparable real time FP32 throughput.
If Apple hasn't changed its ALU layout lately each lane should still be capable of 2 MADDs/lane. I was able to follow up somewhat up to Adreno650 which has 512SPs also with a theoretical throughput of 4 FLOPs/lane at 587MHz. I usually use Kishonti's Gfxbench for my guesswork and in a quite stressful 4k like Aztec ruins it looks roughly like that: https://gfxbench.com/result.jsp?benchmark=gfx50&test=778&order=median&base=gpu&ff-check-desktop=0Thanks! Do you happen to know more about the architecture of Adreno SPs? I’ve found various claims around (there is a paper that depicts Adreno 630 using 3—wide SIMD like AMD or Apple).
As I said, what surprises me that compute benchmarks of Adreno740 are not very good: https://browser.geekbench.com/v6/compute/search?utf8=✓&q=SM-S916N
This is comparable to A14 across various benchmarks and that GPU has 512 SIMD lanes with half-rate FP32. It’s as you say, I’d expect 2048 SIMD lanes to be at Lesart competitive with the 1024 lanes of M1 (with the adjustment that M1 runs a higher clock). But this is definitely not the case.
Based on this video Link at 2:53, adreno 740 1536 ALU @680 MHz. @Nebuchadnezzar please correct itDoes anyone have some information about the Adreno 740? I couldn't find any details on Qualcomm website, and the only source (Wikipedia) that I've seen mentioning some lists quite ridiculous specs (like 2560 ALUs and 3.5TFLOPS FP32) which don't make much sense to me.
Based on this video Link at 2:53, adreno 740 1536 ALU @680 MHz. @Nebuchadnezzar please correct it
Cute. I hope the time spent writing this very useful post at least made you feel like you are contributing.
Ah the old sarcasm based reply, you asked for information about the Adreno 740 and my Google search link provided that for you in abundance.
So you got what you asked for, maybe next time construct a thread with a bit more detail of exactly what you're looking for, where you've already looked and what you're hoping to gain so people know how best to help you.
I was under impression that this was a GPU enthusiast community? Didn't think that I had to specify this in more detail.
So roughly 2TFLOPs, assuming the usual scalar ALUs. This definitely sounds more realistic than other claims and combined with the faster RAM also explains why it's does better than A15/A16 in some synthetic GPU benchmarks. Still wondering about the poor Geekbench compute scores. Maybe indeed driver issues like Ailuros suggests. Probably not optimised for GPGPU.
Anyway, that's a humongous GPU for a mobile phone! Definitely dwarfs Apple's 640-ALU (5*4*32) designs.
If QCOM would have its own OS/sw platform, its GPU drivers would show much higher stability, predictability and what not. IMHO it's a common problem for anything Android and not exclusive to Adreno/QCOM etc.
I can't see a video for some reason, but obviously I can't give any input here in my position either way.Based on this video Link at 2:53, adreno 740 1536 ALU @680 MHz. @Nebuchadnezzar please correct it
Obviously none of the above is a viable replacement for any smartphone and Android, which is their actual core market for their SoCs right now.Qualcomm has usually delivered their own fork of Linux and (I assume here) Mesa. What else should they have?
IMO, if Qualcomm valued stability and predictability they should do what Intel and AMD do for their platforms, ie work with upstream Linux (and Mesa) to get stuff included their as early as possible.
Obviously none of the above is a viable replacement for any smartphone and Android, which is their actual core market for their SoCs right now.