ARM Mali-400 MP

Discussion in 'Mobile Graphics Architectures and IP' started by Rob Evans, Jun 2, 2008.

  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Is Mali400 a single or a dual core config? I had the impression its the latter.
     
  2. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    711
    Likes Received:
    32
    If we are to believe these documents
    - ARM site
    - cbinews
    Mali-400 can have one to four cores.

    Also according to the document from ARM site above, it looks like Mali-400 with 1 core has the same pixel performance as Mali-200 but twice its geometry performance.
     
  3. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    Here's a presentation that ARM did on Mali at the ARM DEVCON 2008

    http://library.corporate-ir.net/library/19/197/197211/items/310754/Mike_Dimelow.pdf

    A lot interesting info and assertions, some highlights are:-

    Mali 400MP "beats 6 SGX cores 520/530/535/540/545/550"

    Hmmm...comparing Mali 400MP to 520/530/535 ?? whats that about
    550 never existed ?


    Mali 400MP dual core fill rate 550Mpix 30M tri, SGX545 does 1000Bpix and 40M tri ?

    http://www.imgtec.com/factsheets/powervr/POWERVR_SGX_Series5_IP_Core_Family_[2.3].pdf


    13 GPU licenses and state the areas as being PMP,PND,wireless and STB. Lead partner for Mali-400 is ST micro. Other stated licencees (for various Mali cores) are Zoran, ST, Micronas, Ericsson (note not sony Ericsson) Cisco Systems, NXP, telechips,RMI, Broadcom. Remaining licencees are "private"

    Another OpenGLES1.x core coming out in 2009, higher end multi-core Vithar coming in 2010, and multi-core top end THOR in 2012. (slide 13)

    Slide 15 shows target silicon will be shipping to 8 customers in 2008/9

    Launch dates for various Mali-ed products are on slide 13.
     
    #103 tangey, Feb 27, 2009
    Last edited by a moderator: Feb 27, 2009
  4. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    Cheers.

    Surely they meant 555?

    Tsk tsk, marketing ftw? PowerVR's numbers include a 2.5x multiplier because of the claimed higher efficiency of TBDRs when there is overdraw. That number is a tad excessive, but it is fair to say that if you designed a game for both IMRs and TBDRs then you could save on a Z-Pass which does increase your effective bandwidth.

    Mali 400 has 1 TMU per core, which is similar to SGX 530. Their dual-core variant would, TMU-wise, be equivalent to a SGX 540/543, while their quad-core variant would be equivalent to a SGX 555. All per clock, ignoring any potential effective fillrate gain from SGX being a TBDR. One of the key reasons why they claim they 'beat' SGX performance-wise is that Mali is supposedly clockable at 240-275MHz, while PowerVR quotes 200MHz and in practice OMAP3 only delivers 100-133MHz (probably about 200MHz on 45nm though! ;))

    Given that the same presentation mentions a benchmark done at 220MHz, I'm a little bit skeptical that even 240MHz is realistic for the first SoCs but we'll see what happens. The history of meeting clock speed targets in the industry is pretty damn awful.

    And of course, there's more to handheld GPUs than the number of TMUs and peak polygon throughput - although I guess many people in the industry often forget about that, or just don't care. The only two companies I'm aware of which have detailed (up to a certain extent) their shader pipeline is ATI and... Samsung (see my other post a minute ago). Oh well, maybe one day!

    Yes, it's interesting that ST-Micro seems to be their only Mali-400MP licensee at this point though! At least Mali 200 has a lot more momentum than I thought... :)
     
  5. Grumpy

    Newcomer

    Joined:
    May 4, 2006
    Messages:
    18
    Likes Received:
    0
    Location:
    A cold place
    Slide 7 there doesn't match http://www.arm.com/miscPDFs/21863.pdf on performance & area though (at least for M200).

    Lead licensee in ARM lingo usually means flagship or "driving customer" rather than "the only one", although I haven't seen any other announcements for M400 so far. Could it have anything to do with the "high-end 3-D graphics accelerator" mentioned here?
     
  6. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    5mm2 vs. 4.1mm2 at 65nm and it doesn't have a ~ as with M400 estimated die sizes. Past estimates were a lot more optimistic:

    http://www.hardocp.com/article.html?art=ODAyLDEsLGhlbnRodXNpYXN0

    If you want to compare MP vs. MP (with 2 cores at a time) and always on what each of the two has officially announced:

    Mali MP dual core:

    275MHz / 550MPixels/s / 30M Tris/s = ~9mm2@65nm (which should actually read at 240MHz since that's their rate max frequency for 65nm)

    SGX543 dual core:

    200MHz / 800MPixels/s / 70M Tris/s = 16mm2@65nm

    SGX543 single core:

    200MHz / 400MPixels/s / 35M Tris/s = 8mm2@65nm

    Since diagrams show that MaliMP scales fragment processors but still has just one vertex processor (which has been confirmed by arjan in another thread) I'd rather give MaliMP 18M Tris/s than 30 after all. Besides the fact that you gain higher geometry efficiency on a USC in general, if you normalize both on the same frequency level (and since I haven't used any overdraw factor for SGX fillrates) I sure hope their die estimates for Mali MP are more accurate than in the past.
     
    #106 Ailuros, Feb 28, 2009
    Last edited by a moderator: Feb 28, 2009
  7. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    Looking at that presentation, I can't help but notice the 300MP/s @ 200MHz number. Err, what? 1.5 TMUs or ROPs, really? Gives me a hunch it might be inflated in the same way TBDR numbers are (although in this case I can't figure out what the reasoning might be!) and so the 2-3mm2 was really for a half-pixel TMU configuration, while the only Mali200 config that still exists today is a full-pixel TMU. I could be horribly wrong, of course. Of course, in this context it is also interesting to look at ARM's claimed scaling numbers from 90GP to 65LP in the presentation tangey posted - they are pretty awful (1.5->1mm² for Mali55!) and can also help explain the numbers a bit.

    BTW: I found this little gem in an ARM presentation. A die partitioning for the Mali55! ;) There's a fair bit of SRAM, but the biggest block is obviously the 'Texture Mapper', followed by the 'Tri setup master', and then a bunch of smaller ones that are much harder to read but which include a 'Framebuffer/Blenders' block and a 'MMU AMB' one.
    http://www.jp.arm.com/event/pdf/forum2007/t3-1.pdf - Page 11

    I actually also found ARM11, Cortex-A8 and Cortex-A9 partitionings in the past, although I can't remember where I saved them if I did at all... Might not be easy to find again sadly but it is out there.
     
  8. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    That presentation was before ARM bought Falanx and it fairly sounds like an effective fillrate. It's the fillrate that struck you as weird? Try 20GFLOPs@200MHz from a mere 3mm2@90nm core. Damn creative math if you ask me.

    Instead of trying to convince with their presentations that they consume X less bandwidth than the competition (which I have severe doubts it's even true), it would be nice for a change not to explain their final die sizes but each cores final capabilities in relation to the real final die size.

    Jebus I never would had noticed without you pointing me at it.
     
  9. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    Well, it's the fillrate that struck me as not plausibly being a raw number. The GFlops number, I'd find believable at 3mm2 if you have a large batch size and it's FP16. However it was my understanding that their PS is FP24 and the VS is FP32... So I'll admit it certainly wasn't believable as a raw number either.

    WRT bandwidth, I think many of their comparisons make perfect sense relative to a basic kind of tiler that doesn't really exist in the industry, although it'd probably be nearest what ATI did in the OpenGL ES 1.x generation (but not quite that either).

    Anyway, most of the bandwidth claims in the industry are even less credible than most of the spam e-mails I get in my mailbox. Probably at the same level as home fitness equipment marketing...

    Well it helps that I saw similarly colored diagrams in much larger format s for other ARM cores in the past, so my brain instantly realized the similarity... :)
     
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Bottomline is they're theoretically claiming higher raw fillrates, which they're not on the same frequency basis. I won't exclude that their cores could end up more tolerant to higher frequencies, but those figures they're presenting don't even point that way.

    Or that the 1st generation GoForce had pixel shaders.... :p

    Don't get me wrong I really like Falanx and I find Mali as an architecture very insteresting, in fact more interesting than Tegra. I just don't see the reason for that type of marketing; you lose more than you gain after all at least IMHLO.
     
  11. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    Heh, the most terrifying thing is it did have pixel shaders; in a few ways they seemed in fact more advanced than the GF3's... Yet it didn't have true Early-Z; depth testing saved power through clock gating and preventing memory accesses, but it never improved shading performance one iota. Ugh... It's a pretty weird and not always very logical architecture.

    Here's the relevant patent: http://v3.espacenet.com/publication...=A2&FT=D&date=20070307&DB=EPODOC&locale=en_EP

    What is much more laughably primitive in the original GoForce is the transform engine, which just reuses the setup engine to do very basic transforms in HW instead of on the CPU. Honestly, I'm not sure why they even bothered... heh. And of course, the whole 'let's keep the framebuffer/textures in on-chip SRAM!' idea was insane. The original GoForce probably was awful at hiding memory latency therefore; I wonder how/if that evolved in the 4800/5500 when they started being dependent on external memory for textures.

    Yeah, it is definitely interesting. The basic rendering strategy is certainly much more interesting than Tegra's. I don't know how exciting/smart the low-level details are for either since that's basically unknown for everybody in the handheld world, but I can honestly say I'd love to know in both cases... ;)
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    I just skimmed through it, but it rather sounds like a generic scalar ALU, which (unless I've missed something) I wouldn't necessarily conclude that its capable of pixel shading.

    Didn't they also claim a Geometry engine?

    Anyway we're way OT with that kind of stuff.

    All in all I have the feeling that IMG announced 543MP just to take the wind out of their marketing sails; ok way too exaggerated but I think you get my point. Even if you'd get to linear scaling with multiple cores, there's always a portion of redundancy involved and in those particular markets die area and power consumption are way more critical than in any other market.

    The fact that die estimates were way off in the past, aren't really annoying me with Mali. What annoys me is that estimated performance and featureset of that 2005 presentation are quite on a different level than the final result.

    Mali can through 4 passes yield 16xMSAA; under normal gaming conditions the resources for that are way too high. For anything OpenVG though (always depending on the amount of sub-paths in each path) it's certainly a sample amount that might be needed there.

    I've no idea if NV made any modifications to their CSAA algorithm for OpenVG; if not it might give some nasty side-effects with VG content.
     
  13. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    So what else do you want? Don't you remember how incredibly basic DX8 Pixel Shading was? :)

    Yup, it's pretty depressing seeing how every single handheld chip/IP I've ever looked at, I've *always* overestimated its 3D performance by at least 2x until I had the real info. At least in Mali's case, I can claim it's not my fault... ;)

    Sigh, I really am a retard. See, Neil Trevett (president of Khronos/chair of OpenGL ES) was at the NV stand and I didn't realize it until it was too late, so I didn't ask him any questions except stuff obviously related to NV/Tegra. Bah! :( Heck, I even realized he was probably at the stand, but didn't realize that was him....
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    I as a layman have a hard time calling register combiners as pixel shaders but that's just me LOL :p
    It's never too late to find out ;)
     
  15. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    RMI have recently launched a series of MIPS based app pros (Au1300) with some of them including Mali200 cores. Thats the first I've seen of Mali being mated to a non-ARM processor.

    http://www.rmicorp.com/products/Au1300.htm

    Performance stats for the graphics core are stated as:-

    • Open GL ES 1.1 and 2.0 and OpenVG 1.1 standards support.
    • Vertex and Fragment shaders.
    • 10M polygons per second.
    • 4x full-screen anti-aliasing with no impact on performance.
    • Up to 25x FSAA supported.
    • Alpha blending and texture caching.

    So in this impmentation at least, they are getting 10M polys, which is quite different from the 16M stated here:-
    http://www.arm.com/miscPDFs/21863.pdf



    But again its hard to make comparisons as there is nothing in the RMI data that hints to either the 3D graphics clock, or the fab process used for the chip.
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Polygon rates should be truly subject to core frequency used. I'm just a bit puzzled with the up to 25xFSAA odd sample amount.
     
  17. Grumpy

    Newcomer

    Joined:
    May 4, 2006
    Messages:
    18
    Likes Received:
    0
    Location:
    A cold place
  18. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    A quick bump to point out that there are now two Mali development boards available on the ARM website, also confirming that the U8500 uses the Mali 400:
    - ST-Ericsson U8500: http://www.malideveloper.com/platforms/boards/st-e-mop500-development-platform.php (Mali 400)
    - Telechips TCC8900: http://www.malideveloper.com/platforms/boards/telechips-tcc8900-development-platform.php (Mali 200)

    There's no indication of the number of cores in the U8500, so I assume it's just one. No indications of MHz for either the A9 or the Mali400 in there, but two interesting tidbits there and on the new ST-Ericsson page about the U8500: it sports a 1080p H.264 *High Profile* camcorder but, unlike OMAP4, only 32-bit LPDDR2. Also has two camera ISPs: 18MP for the primary, 5MP for the secondary. Nice, I wonder how much a phone like that would cost in 1H11... Probably more than anyone sane would ever pay but heh ;)
     
  19. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    711
    Likes Received:
    32
    That's a dual A9-based SoC.

    Ref: http://www.malideveloper.com/platforms/boards/st-e-mop500-development-platform.php

    EDIT: Hum, you link the same page as I do, and I clearly see mention of dual core, odd...
     
  20. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,021
    Likes Received:
    292
    Location:
    UK
    Err, gosh, I realize now my sentence was very ambiguous: I didn't mean the number of cores for the A9. I meant the number of cores for the Mali 400! :runaway:
    If it's a single core, then that's a 1 TMU design and, assuming it's clocked above 200MHz as ARM claims should be easy to do, would probably be most comparable to the SGX530 in the 45nm OMAP3 (although probably a bit faster in a good day, i.e. poly rate, and a bit slower on a bad one, i.e. overdraw). Frankly not very impressive 3D-wise for such an otherwise very powerful design, but we'll see.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...