ARM Midgard Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by arjan de lumens, Nov 10, 2010.

  1. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    #1 arjan de lumens, Nov 10, 2010
    Last edited by a moderator: Nov 10, 2010
  2. roninja

    Regular

    Joined:
    Feb 9, 2002
    Messages:
    268
    Likes Received:
    0
    So at first glance appears to be ARM's solution up against Series 5 XT in the MP space. Doesn't sound like they've licenced it yet either. Lets see.
     
  3. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    Looks like this one can leverage cache coherence with Cortex-A15, a feature I don't expect IMG to provide. It's interesting that ARM is now positioning architectural advantages by pairing the two.

    Also looks like they're offering clear up to double precision operations, hopefully in the fragment shaders which in Mali-400 are limited to only FP16. Don't really see how GPGPU they can be with only offering it in the vertex shaders, afterall.
     
    #3 Exophase, Nov 10, 2010
    Last edited by a moderator: Nov 10, 2010
  4. argor

    Newcomer

    Joined:
    Nov 25, 2008
    Messages:
    96
    Likes Received:
    0
    they are lousy in Norse mythology i expedited more from a design team based in norway
    miðgarður is the Human realm in Norse mythology
    while ásgarður is the home of the gods in Norse mythology
    nice to see what they did cache coherence with Cortex-A15
     
  5. wishiknew

    Regular

    Joined:
    May 19, 2004
    Messages:
    332
    Likes Received:
    6
    Is this what's in Samsung Orion soc with that "5 times the 3D graphics performance over the previous processor generation"?
     
  6. iwod

    Newcomer

    Joined:
    Jun 3, 2004
    Messages:
    179
    Likes Received:
    1
    Are there anything special about their Mali GPU, compare to PowerVR GPU which is better?

    And any reason why cache coherence wont work on PowerVR gen 6? ( After all SoC like A4 are custom designs anyway )
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Yeah that's definitely interesting.

    Well I only overread the documents quickly but I couldn't find anything that suggests that fragment and geometry is still in separated units. In fact the diagram mentions up to 4 shader cores (whatever that stands for) and no sign of a geometry unit like in past diagrams.

    Maybe arjan can shed some light into that part and explain if those are USC units now. I doubt they'd advertise GPGPU without being >FP16 in the fragment shaders.
     
  8. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    Why would you "expect" that?
     
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    OCL requires full fp32 precision.
     
  10. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    As you say, the absence of a seperate vertex processor in the diagram suggests a unified design, they probably don't want to shout too loudly about this as it flies in the face of their previous marketing.

    To be honest imo this is all fluff at the moment, would be nice to see some numbers put beside the thing ;)
     
  11. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Yes, it's unified. As for floating-point data types support, it supports fp16, fp32 and fp64.
     
  12. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    Seems like a pretty different design compared to Mali-400. One of the diagrams suggests a two ALU to TMU/ROP layout, making it similar to SGX540 per-core. I guess now vertex processing will be load balanced between the multiple cores, maybe that's a problem for drivers to deal with.

    At least it's more than IMG has said about Series 6 ;p
     
  13. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    :) :)
     
  14. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,634
    Likes Received:
    436
    They might take a little longer, but the coherent link almost certainly just comes down to a ring bus with snooping ... connecting it to L2 texture cache isn't rocket science (and L1 will probably just be flushed when necessary). GPU writes are probably write through, if they get cached at all, so those stay coherent with the CPUs automatically.

    Just because there is cache coherency doesn't mean there is a fully coherent multi-level fully read-write cache hierarchy inside the Mali ... there almost certainly isn't (if NVIDIA didn't do it, I don't see Mali doing it).
     
    #14 MfA, Nov 11, 2010
    Last edited by a moderator: Nov 11, 2010
  15. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    It's still a matter of interfacing it specifically to an ARM coherency link, although I really have no idea how interested IMG is or isn't in doing something like this (you could say having standard parts that can interface with AMBA isn't different, or maybe this is just a glue implementation detail that can be handled by something else or the SoC implementers)

    Coherency with writes out from the GPU probably barely matter. Isn't the real point of this to get parameter data directly to the GPU instead of crossing into main memory first?

    Well I don't know what nVidia's coherency is in this context, but at the very least Tegra 2 is "coherent" by default on account of sharing the L2 cache.
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    A-HA!!!

    See above :p
     
  17. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    Even the curent SGX supports cache coherency, it's a function of the SOC (and CPU suport) as to if this gets hooked up to the CPU or not.
    The last thing you want to be doing is streaming GPU input parameters through the CPU cache, the volume of this data is likely to just flush all the stuff you do want in your cache out of it. You also don't want the GPU constantly snooping the CPU cache as it will kill the performance of both. This type of data is best streamed dirctly to memory using write combiners to maximise throughput (as it normally is in desktop space), doing otherwise is likely to hurt overall perf.

    John.
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,634
    Likes Received:
    436
    If you are trying to do a low latency double buffered OpenCL operation on the GPU you might want to have the input or output bypass memory altogether.

    Although it seems a bit far fetched for the moment.
     
  19. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    The data ultimately has to go back to memory (it's still a WB cache we're talking about here), but it's true that you might be able to avoid some round trips if you're doing a lot of passing data backwards and forwards in OpenCL.

    Personally I'd be looking very hard at my algorithm if this is what's happening!

    Interestingly some might argue that these scenarios would be better handled by extending the local memory hierarchy another level...

    Anyway as I say, SGX already largely supports all this ;)

    John.
     
  20. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,634
    Likes Received:
    436
    Doesn't ARM have page table attributes to determine caching behaviour BTW?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...