ARM Mali-400 MP

Is Mali400 a single or a dual core config? I had the impression its the latter.
If we are to believe these documents
- ARM site
- cbinews
Mali-400 can have one to four cores.

Also according to the document from ARM site above, it looks like Mali-400 with 1 core has the same pixel performance as Mali-200 but twice its geometry performance.
 
Here's a presentation that ARM did on Mali at the ARM DEVCON 2008

http://library.corporate-ir.net/library/19/197/197211/items/310754/Mike_Dimelow.pdf

A lot interesting info and assertions, some highlights are:-

Mali 400MP "beats 6 SGX cores 520/530/535/540/545/550"

Hmmm...comparing Mali 400MP to 520/530/535 ?? whats that about
550 never existed ?


Mali 400MP dual core fill rate 550Mpix 30M tri, SGX545 does 1000Bpix and 40M tri ?

http://www.imgtec.com/factsheets/powervr/POWERVR_SGX_Series5_IP_Core_Family_[2.3].pdf


13 GPU licenses and state the areas as being PMP,PND,wireless and STB. Lead partner for Mali-400 is ST micro. Other stated licencees (for various Mali cores) are Zoran, ST, Micronas, Ericsson (note not sony Ericsson) Cisco Systems, NXP, telechips,RMI, Broadcom. Remaining licencees are "private"

Another OpenGLES1.x core coming out in 2009, higher end multi-core Vithar coming in 2010, and multi-core top end THOR in 2012. (slide 13)

Slide 15 shows target silicon will be shipping to 8 customers in 2008/9

Launch dates for various Mali-ed products are on slide 13.
 
Last edited by a moderator:
Here's a presentation that ARM did on Mali at the ARM DEVCON 2008
Cheers.

Hmmm...comparing Mali 400MP to 520/530/535 ?? whats that about
550 never existed ?
Surely they meant 555?

Mali 400MP dual core fill rate 550Mpix 30M tri, SGX545 does 1000Bpix and 40M tri ?
Tsk tsk, marketing ftw? PowerVR's numbers include a 2.5x multiplier because of the claimed higher efficiency of TBDRs when there is overdraw. That number is a tad excessive, but it is fair to say that if you designed a game for both IMRs and TBDRs then you could save on a Z-Pass which does increase your effective bandwidth.

Mali 400 has 1 TMU per core, which is similar to SGX 530. Their dual-core variant would, TMU-wise, be equivalent to a SGX 540/543, while their quad-core variant would be equivalent to a SGX 555. All per clock, ignoring any potential effective fillrate gain from SGX being a TBDR. One of the key reasons why they claim they 'beat' SGX performance-wise is that Mali is supposedly clockable at 240-275MHz, while PowerVR quotes 200MHz and in practice OMAP3 only delivers 100-133MHz (probably about 200MHz on 45nm though! ;))

Given that the same presentation mentions a benchmark done at 220MHz, I'm a little bit skeptical that even 240MHz is realistic for the first SoCs but we'll see what happens. The history of meeting clock speed targets in the industry is pretty damn awful.

And of course, there's more to handheld GPUs than the number of TMUs and peak polygon throughput - although I guess many people in the industry often forget about that, or just don't care. The only two companies I'm aware of which have detailed (up to a certain extent) their shader pipeline is ATI and... Samsung (see my other post a minute ago). Oh well, maybe one day!

Launch dates for various Mali-ed products are on slide 13.
Yes, it's interesting that ST-Micro seems to be their only Mali-400MP licensee at this point though! At least Mali 200 has a lot more momentum than I thought... :)
 
Here's a presentation that ARM did on Mali at the ARM DEVCON 2008
Slide 7 there doesn't match http://www.arm.com/miscPDFs/21863.pdf on performance & area though (at least for M200).

Yes, it's interesting that ST-Micro seems to be their only Mali-400MP licensee at this point though!
Lead licensee in ARM lingo usually means flagship or "driving customer" rather than "the only one", although I haven't seen any other announcements for M400 so far. Could it have anything to do with the "high-end 3-D graphics accelerator" mentioned here?
 
Slide 7 there doesn't match http://www.arm.com/miscPDFs/21863.pdf on performance & area though (at least for M200).

5mm2 vs. 4.1mm2 at 65nm and it doesn't have a ~ as with M400 estimated die sizes. Past estimates were a lot more optimistic:

http://www.hardocp.com/article.html?art=ODAyLDEsLGhlbnRodXNpYXN0

Mali 400MP "beats 6 SGX cores 520/530/535/540/545/550"

Hmmm...comparing Mali 400MP to 520/530/535 ?? whats that about
550 never existed ?
If you want to compare MP vs. MP (with 2 cores at a time) and always on what each of the two has officially announced:

Mali MP dual core:

275MHz / 550MPixels/s / 30M Tris/s = ~9mm2@65nm (which should actually read at 240MHz since that's their rate max frequency for 65nm)

SGX543 dual core:

200MHz / 800MPixels/s / 70M Tris/s = 16mm2@65nm

SGX543 single core:

200MHz / 400MPixels/s / 35M Tris/s = 8mm2@65nm

Since diagrams show that MaliMP scales fragment processors but still has just one vertex processor (which has been confirmed by arjan in another thread) I'd rather give MaliMP 18M Tris/s than 30 after all. Besides the fact that you gain higher geometry efficiency on a USC in general, if you normalize both on the same frequency level (and since I haven't used any overdraw factor for SGX fillrates) I sure hope their die estimates for Mali MP are more accurate than in the past.
 
Last edited by a moderator:
5mm2 vs. 4.1mm2 at 65nm and it doesn't have a ~ as with M400 estimated die sizes. Past estimates were a lot more optimistic:

http://www.hardocp.com/article.html?art=ODAyLDEsLGhlbnRodXNpYXN0
Looking at that presentation, I can't help but notice the 300MP/s @ 200MHz number. Err, what? 1.5 TMUs or ROPs, really? Gives me a hunch it might be inflated in the same way TBDR numbers are (although in this case I can't figure out what the reasoning might be!) and so the 2-3mm2 was really for a half-pixel TMU configuration, while the only Mali200 config that still exists today is a full-pixel TMU. I could be horribly wrong, of course. Of course, in this context it is also interesting to look at ARM's claimed scaling numbers from 90GP to 65LP in the presentation tangey posted - they are pretty awful (1.5->1mm² for Mali55!) and can also help explain the numbers a bit.

BTW: I found this little gem in an ARM presentation. A die partitioning for the Mali55! ;) There's a fair bit of SRAM, but the biggest block is obviously the 'Texture Mapper', followed by the 'Tri setup master', and then a bunch of smaller ones that are much harder to read but which include a 'Framebuffer/Blenders' block and a 'MMU AMB' one.
http://www.jp.arm.com/event/pdf/forum2007/t3-1.pdf - Page 11

I actually also found ARM11, Cortex-A8 and Cortex-A9 partitionings in the past, although I can't remember where I saved them if I did at all... Might not be easy to find again sadly but it is out there.
 
Looking at that presentation, I can't help but notice the 300MP/s @ 200MHz number. Err, what? 1.5 TMUs or ROPs, really? Gives me a hunch it might be inflated in the same way TBDR numbers are (although in this case I can't figure out what the reasoning might be!) and so the 2-3mm2 was really for a half-pixel TMU configuration, while the only Mali200 config that still exists today is a full-pixel TMU.

That presentation was before ARM bought Falanx and it fairly sounds like an effective fillrate. It's the fillrate that struck you as weird? Try 20GFLOPs@200MHz from a mere 3mm2@90nm core. Damn creative math if you ask me.

Instead of trying to convince with their presentations that they consume X less bandwidth than the competition (which I have severe doubts it's even true), it would be nice for a change not to explain their final die sizes but each cores final capabilities in relation to the real final die size.

BTW: I found this little gem in an ARM presentation. A die partitioning for the Mali55! ;) There's a fair bit of SRAM, but the biggest block is obviously the 'Texture Mapper', followed by the 'Tri setup master', and then a bunch of smaller ones that are much harder to read but which include a 'Framebuffer/Blenders' block and a 'MMU AMB' one.
http://www.jp.arm.com/event/pdf/forum2007/t3-1.pdf - Page 11

Jebus I never would had noticed without you pointing me at it.
 
That presentation was before ARM bought Falanx and it fairly sounds like an effective fillrate. It's the fillrate that struck you as weird? Try 20GFLOPs@200MHz from a mere 3mm2@90nm core. Damn creative math if you ask me.
Well, it's the fillrate that struck me as not plausibly being a raw number. The GFlops number, I'd find believable at 3mm2 if you have a large batch size and it's FP16. However it was my understanding that their PS is FP24 and the VS is FP32... So I'll admit it certainly wasn't believable as a raw number either.

Instead of trying to convince with their presentations that they consume X less bandwidth than the competition (which I have severe doubts it's even true)
WRT bandwidth, I think many of their comparisons make perfect sense relative to a basic kind of tiler that doesn't really exist in the industry, although it'd probably be nearest what ATI did in the OpenGL ES 1.x generation (but not quite that either).

Anyway, most of the bandwidth claims in the industry are even less credible than most of the spam e-mails I get in my mailbox. Probably at the same level as home fitness equipment marketing...

Jebus I never would had noticed without you pointing me at it.
Well it helps that I saw similarly colored diagrams in much larger format s for other ARM cores in the past, so my brain instantly realized the similarity... :)
 
Well, it's the fillrate that struck me as not plausibly being a raw number.

Bottomline is they're theoretically claiming higher raw fillrates, which they're not on the same frequency basis. I won't exclude that their cores could end up more tolerant to higher frequencies, but those figures they're presenting don't even point that way.

Anyway, most of the bandwidth claims in the industry are even less credible than most of the spam e-mails I get in my mailbox. Probably at the same level as home fitness equipment marketing...

Or that the 1st generation GoForce had pixel shaders.... :p

Don't get me wrong I really like Falanx and I find Mali as an architecture very insteresting, in fact more interesting than Tegra. I just don't see the reason for that type of marketing; you lose more than you gain after all at least IMHLO.
 
Or that the 1st generation GoForce had pixel shaders.... :p
Heh, the most terrifying thing is it did have pixel shaders; in a few ways they seemed in fact more advanced than the GF3's... Yet it didn't have true Early-Z; depth testing saved power through clock gating and preventing memory accesses, but it never improved shading performance one iota. Ugh... It's a pretty weird and not always very logical architecture.

Here's the relevant patent: http://v3.espacenet.com/publication...=A2&FT=D&date=20070307&DB=EPODOC&locale=en_EP

What is much more laughably primitive in the original GoForce is the transform engine, which just reuses the setup engine to do very basic transforms in HW instead of on the CPU. Honestly, I'm not sure why they even bothered... heh. And of course, the whole 'let's keep the framebuffer/textures in on-chip SRAM!' idea was insane. The original GoForce probably was awful at hiding memory latency therefore; I wonder how/if that evolved in the 4800/5500 when they started being dependent on external memory for textures.

Don't get me wrong I really like Falanx and I find Mali as an architecture very insteresting, in fact more interesting than Tegra. I just don't see the reason for that type of marketing; you lose more than you gain after all at least IMHLO.
Yeah, it is definitely interesting. The basic rendering strategy is certainly much more interesting than Tegra's. I don't know how exciting/smart the low-level details are for either since that's basically unknown for everybody in the handheld world, but I can honestly say I'd love to know in both cases... ;)
 
Heh, the most terrifying thing is it did have pixel shaders; in a few ways they seemed in fact more advanced than the GF3's... Yet it didn't have true Early-Z; depth testing saved power through clock gating and preventing memory accesses, but it never improved shading performance one iota. Ugh... It's a pretty weird and not always very logical architecture.

Here's the relevant patent: http://v3.espacenet.com/publicationDetails/description?CC=EP&NR=1759380A2&KC=A2&FT=D&date=20070307&DB=EPODOC&locale=en_E

I just skimmed through it, but it rather sounds like a generic scalar ALU, which (unless I've missed something) I wouldn't necessarily conclude that its capable of pixel shading.

What is much more laughably primitive in the original GoForce is the transform engine, which just reuses the setup engine to do very basic transforms in HW instead of on the CPU. Honestly, I'm not sure why they even bothered... heh. And of course, the whole 'let's keep the framebuffer/textures in on-chip SRAM!' idea was insane. The original GoForce probably was awful at hiding memory latency therefore; I wonder how/if that evolved in the 4800/5500 when they started being dependent on external memory for textures.

Didn't they also claim a Geometry engine?

Anyway we're way OT with that kind of stuff.

Yeah, it is definitely interesting. The basic rendering strategy is certainly much more interesting than Tegra's. I don't know how exciting/smart the low-level details are for either since that's basically unknown for everybody in the handheld world, but I can honestly say I'd love to know in both cases... ;)

All in all I have the feeling that IMG announced 543MP just to take the wind out of their marketing sails; ok way too exaggerated but I think you get my point. Even if you'd get to linear scaling with multiple cores, there's always a portion of redundancy involved and in those particular markets die area and power consumption are way more critical than in any other market.

The fact that die estimates were way off in the past, aren't really annoying me with Mali. What annoys me is that estimated performance and featureset of that 2005 presentation are quite on a different level than the final result.

Mali can through 4 passes yield 16xMSAA; under normal gaming conditions the resources for that are way too high. For anything OpenVG though (always depending on the amount of sub-paths in each path) it's certainly a sample amount that might be needed there.

I've no idea if NV made any modifications to their CSAA algorithm for OpenVG; if not it might give some nasty side-effects with VG content.
 
I just skimmed through it, but it rather sounds like a generic scalar ALU, which (unless I've missed something) I wouldn't necessarily conclude that its capable of pixel shading.
So what else do you want? Don't you remember how incredibly basic DX8 Pixel Shading was? :)

The fact that die estimates were way off in the past, aren't really annoying me with Mali. What annoys me is that estimated performance and featureset of that 2005 presentation are quite on a different level than the final result.
Yup, it's pretty depressing seeing how every single handheld chip/IP I've ever looked at, I've *always* overestimated its 3D performance by at least 2x until I had the real info. At least in Mali's case, I can claim it's not my fault... ;)

I've no idea if NV made any modifications to their CSAA algorithm for OpenVG; if not it might give some nasty side-effects with VG content.
Sigh, I really am a retard. See, Neil Trevett (president of Khronos/chair of OpenGL ES) was at the NV stand and I didn't realize it until it was too late, so I didn't ask him any questions except stuff obviously related to NV/Tegra. Bah! :( Heck, I even realized he was probably at the stand, but didn't realize that was him....
 
So what else do you want? Don't you remember how incredibly basic DX8 Pixel Shading was? :)

I as a layman have a hard time calling register combiners as pixel shaders but that's just me LOL :p
Sigh, I really am a retard. See, Neil Trevett (president of Khronos/chair of OpenGL ES) was at the NV stand and I didn't realize it until it was too late, so I didn't ask him any questions except stuff obviously related to NV/Tegra. Bah! :( Heck, I even realized he was probably at the stand, but didn't realize that was him....

It's never too late to find out ;)
 
RMI have recently launched a series of MIPS based app pros (Au1300) with some of them including Mali200 cores. Thats the first I've seen of Mali being mated to a non-ARM processor.

http://www.rmicorp.com/products/Au1300.htm

Performance stats for the graphics core are stated as:-

• Open GL ES 1.1 and 2.0 and OpenVG 1.1 standards support.
• Vertex and Fragment shaders.
• 10M polygons per second.
• 4x full-screen anti-aliasing with no impact on performance.
• Up to 25x FSAA supported.
• Alpha blending and texture caching.

So in this impmentation at least, they are getting 10M polys, which is quite different from the 16M stated here:-
http://www.arm.com/miscPDFs/21863.pdf



But again its hard to make comparisons as there is nothing in the RMI data that hints to either the 3D graphics clock, or the fab process used for the chip.
 
Polygon rates should be truly subject to core frequency used. I'm just a bit puzzled with the up to 25xFSAA odd sample amount.
 
A quick bump to point out that there are now two Mali development boards available on the ARM website, also confirming that the U8500 uses the Mali 400:
- ST-Ericsson U8500: http://www.malideveloper.com/platforms/boards/st-e-mop500-development-platform.php (Mali 400)
- Telechips TCC8900: http://www.malideveloper.com/platforms/boards/telechips-tcc8900-development-platform.php (Mali 200)

There's no indication of the number of cores in the U8500, so I assume it's just one. No indications of MHz for either the A9 or the Mali400 in there, but two interesting tidbits there and on the new ST-Ericsson page about the U8500: it sports a 1080p H.264 *High Profile* camcorder but, unlike OMAP4, only 32-bit LPDDR2. Also has two camera ISPs: 18MP for the primary, 5MP for the secondary. Nice, I wonder how much a phone like that would cost in 1H11... Probably more than anyone sane would ever pay but heh ;)
 
There's no indication of the number of cores in the U8500, so I assume it's just one. No indications of MHz for either the A9 or the Mali400 in there, but two interesting tidbits there and on the new ST-Ericsson page about the U8500: it sports a 1080p H.264 *High Profile* camcorder but, unlike OMAP4, only 32-bit LPDDR2. Also has two camera ISPs: 18MP for the primary, 5MP for the secondary. Nice, I wonder how much a phone like that would cost in 1H11... Probably more than anyone sane would ever pay but heh ;)
That's a dual A9-based SoC.

Ref: http://www.malideveloper.com/platforms/boards/st-e-mop500-development-platform.php

EDIT: Hum, you link the same page as I do, and I clearly see mention of dual core, odd...
 
That's a dual A9-based SoC.
Err, gosh, I realize now my sentence was very ambiguous: I didn't mean the number of cores for the A9. I meant the number of cores for the Mali 400! :runaway:
If it's a single core, then that's a 1 TMU design and, assuming it's clocked above 200MHz as ARM claims should be easy to do, would probably be most comparable to the SGX530 in the 45nm OMAP3 (although probably a bit faster in a good day, i.e. poly rate, and a bit slower on a bad one, i.e. overdraw). Frankly not very impressive 3D-wise for such an otherwise very powerful design, but we'll see.
 
Back
Top