ARM Midgard Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by arjan de lumens, Nov 10, 2010.

  1. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,144
    Likes Received:
    7,109
    Not really, unless the SGX543MP4 is clocked at >400MHz.
    Let's not mistake the 500MHz Cedar in E-350 with the 280MHz version in C-50/Z-01.

    (EDIT: wrong codename)
     
    #121 ToTTenTranz, Jul 27, 2011
    Last edited by a moderator: Jul 27, 2011
  2. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    ARM are moving to a yearly cycle for new GPU IP (another MP capable Mali T-6xx core will follow in 2013.)

    They're all still a part of the Midgard architecture. A totally new architecture is planned to follow on the standard ~4 year cadence.
     
    #122 Lazy8s, Jul 27, 2011
    Last edited by a moderator: Jul 27, 2011
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Let us not mistake the mem bw of Zacate with mem bw of any SoC that might be expected to have T 604.
    What would be the clock for T604 to reach 68Gflops in the first place?
     
  4. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    28nm high end SoC GPUs aren't to get clocked lower than 500MHz (in fact they'll start at even higher frequencies than that) as you can see from the 2.0 GPixels ARM states for the T604MP4.

    Let's not go to the Rogue generation (which would be more fair in all honesty due to that one and T6x0 belonging to the same generation) and let's do the Series5XT trickery again at just 500MHz:

    SGX554MP2@500MHz = 72 GFLOPs/s

    Past generation stuff Exophase just picked a 543MP4 which you'd need a far higher frequency to exceed the T604 rate. Again though not particularly fair since T604 belongs to the next generation (don't forget that native FP64 support eats up a shitload of die area); it still remains a fact that T604 ALU throughput sounds low. Especially if the Rogue in A9600 should be a MP2@667; if yes that would be >105 GFLOPs per core.

    Just for the record by the time T604 and Rogue will ship in actual devices, C50 and any of today's lower end SoCs will be quite old news. You wouldn't imagine that AMD will increase significantly in performance (at least by a factor of 2.0x for each market segment) under 28nm now would you?
     
  5. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,144
    Likes Received:
    7,109
    Zacate is single-channel 64bit DDR3 1066/1333MHz.
    Exynos 4210 (Mali 400MP4) and OMAP4 are already dual-channel 32bit LPDDR2.
    You really think the T604 is expected to be in a SoC with less bandwidth than that?


    I wouldn't know. I just compared the 68GFLOPS in the slide with the 80GFLOPs in a 500MHz Cedar.
    Exophase said the T604 would be beaten by a SGX543MP4, which is not true for the rumoured ~200MHz clock in PSVita.



    Nope, I'm hoping for Krishna and Wichita to have at least a Caicos GPU in it (160sp, 8 TMUs, 4 ROPs), since the difference in transistor count for Cedar (80sp) is only ~21%, which should be negligible given the smaller node (Brazos are still made in 40nm).
     
    #125 ToTTenTranz, Jul 27, 2011
    Last edited by a moderator: Jul 27, 2011
  6. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Wouldn't ALU rate on the 543MP4 be the same as 554MP2 at the same clock? But yes, I was considering it the same clock speed.

    Of course throwing around FLOPs counts as a means for comparison is pretty limited, when you have no idea of where those FLOPs are allocated.

    As for native FP64, do you think ARM would really be allocating more than the bare minimum of necessary resources (ie, FP64 throughput at 1/4th FP32)? Although that'd still necessitate somewhat wider FP multipliers..
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    Well a Caicos GPU would possibly be from the unit count alone in the ST A9600 Rogue region, where ST is rating at over 210 GFLOPs/s and over 5.2GTexels/s (w/o overdraw). Still quite a distance to 68 GFLOPs/s and 2.0GTexels/s.
     
  8. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    Yes it would; but if anyone would just be going for sterile ALU processing power I'd assume that a 554MP2 could be a better idea (less TMUs, z/stencil units and what not). From the sound of it T604 sounds still like a 1 TMU design like Mali400 unless I'm reading incorrectly into those performance figures (2.0 GPixels / 4 = 500MHz).

    In the case of the 543/4MP4 you'd have 8 TMUs at 500MHz and not just 4 ;)

    Without knowing how each of them looks like in real time throughput/efficiency it's just a game of theoretical vs. theoretical numbers. However irrelevant how bad the ALU efficiency might be on next generation GPU IP, 17 GFLOPs/core is still low.

    Don't know if it's technically accurate but I'd argue that it's not even a 4:1 ratio since the whole thing sounds like 34 FLOPs throughput which doesn't evenly divide by 4. As you say it most likely has some sort of VecN+1 ALUs where the "1" might stand for an additional ADD or MUL. In other words it could be 34 FLOPs single precision and 8 FLOPs double precision.

    Frankly I doubt any of the future architectures has gone through any lenghts to achieve anything lower than 4:1. ARM itself has stated in one of their public writeups regarding T604 that FP64 might get used rarely in real time conditions in the embedded space, but when it does it's needed badly. I don't have anything to object to that and no I'd personally wouldn't want just yet valuable transistors to get invested in something like FP64 to a much higher degree at the cost of raw performance for the biggest majority of cases.
     
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    There is still some difference in mem bw. Just run the numbers. Arguably, these things are vastly more mem limited than ALU limited.

    T604 will need 500MHz to reach 68 gflops.
     
  10. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,144
    Likes Received:
    7,109
    I wouldn't be so sure. LPDDR2 right now is around 800MHz for the dual-channel 32-bit implementations IIRC.

    A T604 in a Cortex A15 SoC might go dual-channel 1333MHz LPDDR2 and beyond.
    Lower-power Wichita might never pass 1600MHz DDR3, so they may end up pretty close.

    And where did that number come from? Honest question.
     
    #130 ToTTenTranz, Jul 28, 2011
    Last edited by a moderator: Jul 28, 2011
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    That's fairly easy to answer. When they state themselves that a MP4 reaches 2.0GPixels/s I wonder what meows on a hot tin roof (tip: it ain't me) :lol:
     
  12. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
  13. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    LPDDR2 stops at 1066 MHz effective frequency, so you need LPDDR3 for 1333+. There are other solutions to high bandwidth in SoCs that don't just chase DRAM frequency, which might be worth looking at.

    An aside, but it's quite grating to read "T604 in a Cortex A15" :razz:
     
  14. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,144
    Likes Received:
    7,109
    Extra channels? More cache?


    Fixed. :p
     
  15. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Isn't FP64 in this context just meaning the support of 64bit HDR frame buffer formats and textures with four 16 bit components, so 4*FP16 RGBA? So what is needed to support that is to beef up the TMUs and ROPs, not the multipliers in the ALUs. It's not about double precision at all.
     
  16. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    No, T6xx supports IEEE754 double precision computation.
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I think you guys might be confusing DDR clock speed and transfer rate. Current high-end ARM SoCs only support LPDDR2 up to 400MHz, and if you look at Micron's product pages for instance you'll see that's also the fastest they sell. OMAP4470 announced support for 466MHz, that should give you an idea of the roadmap.. JEDEC currently specifies up to 533MHz.

    LPDDR2 is only 1.2V vs 1.8V for normal DDR2, so it's going to have more limited clocks. I wonder if maybe Brazos (or if its successors) support 1.35V DDR3L, which should help reduce power consumption a little, although relative to the consumption of the SoC it's probably pretty minor.

    Right now Tegra 2 is hamstrung not only by having only one channel but being limited to 300MHz for LPDDR2 (or 333MHz for DDR2, which of course consumes much more power).. but it seems to be doing okay without much bandwidth.

    Exynos supports DDR3 (and incidentally, so does i.MX53 of all things), and it looks like Tegra 3 and OMAP5 will as well. This might be the better choice for tablets. No idea how high the bus and memory speeds will actually go on these SoCs. I imagine that Wichita will still have the advantage for a good while. But its GPU will inherently need more bandwidth.
     
  18. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    That's why I said effective frequency :wink:
     
  19. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Ah, yeah. And thanks to bad marketing I thought that with DDR3 it was the real frequency ~_~

    Nonetheless, DDR3 can achieve clocks double DDR2 due to having double the prefetch width. It does sound like ARM SoCs are approaching it afterall, but won't be hitting 667MHz any time soon.
     
  20. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Yep, should have looked it up. :oops:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...