Adreno 430 performance preview at Anandtech

Discussion in 'Mobile Graphics Architectures and IP' started by Rys, Feb 12, 2015.

  1. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
  2. orangpelupa

    orangpelupa Elite Bug Hunter
    Legend Veteran

    Joined:
    Oct 14, 2008
    Messages:
    6,835
    Likes Received:
    1,147
    -___- no comparison with intel HD graphics....
     
  3. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    949
    Likes Received:
    98
    Location:
    Luxembourg
    It was mainly Josh's work, I just helped out finishing the article and on Qualcomm's EAS and energy idle drivers.

    Anyway, connect the dots between the resulting memory performance and the rumours:

    http://browser.primatelabs.com/geekbench3/compare/1874253?baseline=1887333
    http://www.androidauthority.com/snapdragon-810-overheating-issues-579284/

    Tom's calls it outright:
     
    #3 Nebuchadnezzar, Feb 12, 2015
    Last edited: Feb 12, 2015
  4. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    So far so good, but where's the battery life/power consumption/throttling part of the review?

    ***edit: don't answer that...I was just told that you had around an hour with the device? Gosh....
     
    #4 Ailuros, Feb 12, 2015
    Last edited: Feb 12, 2015
  5. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    949
    Likes Received:
    98
    Location:
    Luxembourg
    It's a preview. Josh/the media only had a few hours with the device.
     
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    So any result could be bandwidth-limited...
     
  7. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,912
    Likes Received:
    774
    In the sense that it could perform better in bandwidth limited scenarios, sure, although DDR4 ensures that in an absolute sense bandwidth doesn't show much change versus its predecessor. But the more general issue is that main memory latency is awful.
    This doesn't show up all that much in a benchmarking environment dominated by largely cache-resident core tests, and relatively latency insensitive graphics benches. It would have larger impact in everyday use cases in an environment where there is a lot more going on than in controlled benchmarking, and with more realistic data sets.
     
  8. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    647
    Likes Received:
    92
    Any idea if this is the original or the supposedly "fixed" revision of the S810?
    We saw the same thing with the Exynos 5433 vs 5430 (A57 v/s A15)..higher latency but also higher bandwidth (Both have 64 bit, 825 mhz LPDDR3 though). Since we're seeing a similar situation here with A57 v/s Krait..I wonder if this is something to do with the architecture of the A57 itself?

    Higher bandwidth does not seem to be helping performance all that much even in benchmarks though. Looking at the Exynos 7420 v/s 5433 (7420 is LPDDR4), the Geekbench scores are ~15% and ~10% higher for single and multicore respectively. If you normalize for clocks (2.1 v/s 1.9 ghz) this reduces to ~5% and 0%. Link - 5433 vs 7420 on Geekbench.
     
  9. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    949
    Likes Received:
    98
    Location:
    Luxembourg
    Given the memory performance and the initial reports pointed out to a broken memory controller, and continued overheating reports from the media on the Flex2, I doubt it's the fixed version.
     
  10. CrayonHiphop

    Newcomer

    Joined:
    May 4, 2012
    Messages:
    9
    Likes Received:
    0
    IS Adreno 3xx/4xx a scalar architecture? What is its detailed architecture? Thx very much
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    I'm not the best to reply for Adrenos, but yes it has so called "scalar" ALUs. The Adreno 330 has 8*SIMD16 and after that with 4xx I lost track; the 420 could be a 12*SIMD16 config at a slightly lower clock than the peak Adreno330 frequency, but that's just my own speculation.
     
  12. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    I thought the Adrenos were Vec4+Scalar like the X360, hence the former Imageon nickname being mini-xenos?
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Afaik up to Adreno2xx yes; starting from Adreno3xx though they moved to SIMD. ARM Mali and Vivate GPU IP still have vector ALUs.
     
  14. CrayonHiphop

    Newcomer

    Joined:
    May 4, 2012
    Messages:
    9
    Likes Received:
    0
    Thx Ailuros,so it's more like AMD GCN ,the SIMD16 in the CU? And as I know the arch of Mali Midgard ALU pipeline is "vec4 + madd scalar alu with a big scalar alu(madd and sfu)", was that correct? Thx very much
     
  15. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    If you oversimplify things yes you could say that Adreno =/>3xx ALUs are closer to today's desktop architectures.

    Adreno330 is afaik 8*SIMD16 meaning at 600MHz = 8 * SIMD16 * 2 FLOPs/SIMD lane * 0.6 GHz = 153.60 GFLOPs FP32

    A recent Mali is a wee bit more complicated then even past Vec4 ALUs in other GPUs. For each cluster you have 2 pipelines (and 1 TMU); in each pipeline you have 2 Vec4 + SFU (special function unit) else 17 FLOPs theoretical peak per pipeline or 34 FLOPs per cluster. Or to be a bit more realistic since you obviously need SFUs for special function ops more than they sit around idle 16 FLOPs/pipeline or 32 FLOPs/cluster.

    For a Mali T760 MP6 @ 700MHz you have:

    6 * [2 * (4*4)] * 0.7 GHz = 134.40 GFLOPs FP32

    Other GPUs have SFUs too so it's rather silly to count those.
     
  16. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    Those aren't good descriptions of the Mali or Adreno shader cores, unfortunately. For FP32:

    Adreno 330 is 4*SIMD32 multiply-add.
    Midgard in T760 is vec4 MADD + scalar ADD, plus a 4-wide dot product and another scalar flop. 9 flops in the first part of the pipe, 8 in the second.

    Peak is reasonably easy to get close to on Adreno. Only Cthulhu himself knows how to get the Midgard shader compiler to emit something at peak utilisation.
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    *keeps notes* thank you :)
    Malis still have 1 TMU per cluster; for Adreno330 it's 2 TMUs/SIMD or am I wrong again?
     
  18. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    Yep, that's right. I should point out for those new to embedded GPUs that are trying to follow what's going on, that the pipeline I describe for Midgard is present twice in a T760 core and there's 6 of those cores in a T760MP6.
     
  19. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    I'll never manage to remember myself that config from memory :p As long as I know that they get 32 FLOPs FP32 (SFUs aside) per clock per cluster it's good enough for me. That said apart from architectural differences it seems that Adreno 330, Mali Midgaard and PowerVR Rogue all have roughly 1 TMU for every 32 FLOPs FP32.

    Funny coincidence (?) would be that GK20A in K1 is at 48 FLOPs/TMU, while the Maxwell grandchild in X1 goes down to 32 FP32 FLOPs/TMU. I'm not even sure if such ratios exist on a technical level, but I'll skip the FP16 FLOPs/TMU ratio as they're the same in =/>Series6XT and the X1 GPU :runaway:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...