ARM Mali-400 MP

tangey · Oct 22, 2009

Are the current SE Omap3 (Satio etc) phones using the U380 platform, i.e. have they integrated their HSPA along with an Omap3430 into a one-chip solution, or did they drop this platform and go with separate Omap3 + HSPA ?

http://www.ericsson.com/ericsson/press/releases/20080208-1189711.shtml

Also what ever happend to the U500 platform, which was ARM11 + Mali200....did that just get completely dropped ?

http://www.ericsson.com/ericsson/press/releases/20080206-1188885.shtml

Both platforms were announced Feb '08.

Arun · Oct 22, 2009

U380 was a lie; it was two chips in a package (I'd guess OMAP3430 and M340); I do believe it's used in the Satio though. I can't seem to find the presentation where they indicated that anymore, though. U500 probably got canned in favour of the U8500 - it was a very weird achitecture with 3 ARM11 cores (1 for apps, 1 for modem, 1 for multimedia iirc) and I honestly doubt it would have been very impressive.

roninja · Oct 22, 2009

Thought U8500 was due to be announced at MWC but did not make an appearance. Interesting if this part heads Nokia's way or OMAP4 beats em too it?

Grumpy · Oct 22, 2009

Interesting if this part heads Nokia's way or OMAP4 beats em too it?

This at least shows some Nokia interest:
http://zomgitscj.com/nokia-signs-with-st-ericsson-gets-chip-for-1080p-video-recording/

Lazy8s · Oct 26, 2009

A fun looking "UI playground" concept is presented, controlled by multi-touch where icons can be 3D objects and can interact under a physics model, running on a Mali 200 platform.

http://www.youtube.com/watch?v=g0bwMCe6IaA

rbaker · Apr 1, 2010

Nice demo!

Got a chance to see this at GDC in ARM's booth. Very nice demo!

Behind the curtain the panel was connected to a notebook PC.

Lazy8s · Apr 7, 2010

Was the Mali200 clocked around 300 MHz?

That's higher than a mobile phone implementation would use, I'd expect.

More footage of the Canvas demo is shown. Performance on the high resolution display is nice: the video-mapped-to-the-cube getting thrown around as a 3D object and colliding with the pins was a good touch.

http://www.youtube.com/watch?v=rAiK8jrI8CI&sns=em

Ailuros · Aug 10, 2010

It's the first time I see anything containing Mali200 listed:

http://www.glbenchmark.com/phonedetails.jsp?benchmark=glpro11&D=SmartQ V5&testgroup=lowlevel

uhhmm yikes it didn't pass even one quality test...crappy drivers?

rpg.314 · Aug 10, 2010

Where are the ES 2.0 results? I can't find them on a quick look. Or are they MIA?

Rys · Aug 10, 2010

Ailuros said:
uhhmm yikes it didn't pass even one quality test...crappy drivers?

Looks like a bug in their glReadPixels().

Exophase · Sep 15, 2010

Now that Nufront's chip is announced as having Mali 400MP and Samsung Orion is rumored at having it I wanted to open this thread up again.

I had a bunch of stuff here, but I'm seeing more that it's basically superseded by this document: http://infocenter.arm.com/help/topic/com.arm.doc.dui0363d/DUI0363D_opengl_es_app_dev_guide.pdf

Apparently the ALUs are VLIW, SIMD, and 32-bit float. It also has hardware support for 16-bit float: it appears to suggest using these to save bandwidth post-geometry, not to improve computational throughput. One Mali400 ALU looks way more flexible/powerful than a USSE ALU (dunno as much about USSE2), more comparable to that of z430. So a Mali400MP should have pretty comparable computational performance.

JohnH · Sep 15, 2010

Exophase said:
Apparently the ALUs are VLIW, SIMD, and 32-bit float.

That's the geometry processor not the fragment shader. Frag shader is apparently FP24.

John.

Exophase · Sep 15, 2010

JohnH said:
That's the geometry processor not the fragment shader. Frag shader is apparently FP24.

John.

Yeah okay, I missed that it just said "geometry processor." Do you have a source on the fragment shader being FP24?

By the way, earlier in this thread you mentioned that 24-bit gives you 15-bits of fractional data. If this is the usual FP24 implementation which is like IEEE-754 float, ie, 1.8.15 for sign, exponent, and fractional portion then the effective absolute resolution is really minimally 16-bits. Normalized floats have an implicit higher order 1 bit; effectively, this is encoded by the exponent. So with 11-bit texture addressing you should have an effective 5-bits fractional at the full index magnitude.

Xmas · Sep 15, 2010

That document actually says that the fragment shader uses FP16.

darkblu · Sep 15, 2010

Exophase said:
Yeah okay, I missed that it just said "geometry processor." Do you have a source on the fragment shader being FP24?

I don't have a link to a source handly, but I do remember Mali as having an fp24 fragment shader ALUs from earlier presentations/workshop sessions.

Generally, both non-unified shader model, SoC-class GPUs I'm aware of follow the fp32-vertex / fp24-fragment scheme.

edit: I just noticed Xmas' remark. It appears I have a case of faulty memory cells.. Rats. *reports to factory for repair*

By the way, earlier in this thread you mentioned that 24-bit gives you 15-bits of fractional data. If this is the usual FP24 implementation which is like IEEE-754 float, ie, 1.8.15 for sign, exponent, and fractional portion then the effective absolute resolution is really minimally 16-bits. Normalized floats have an implicit higher order 1 bit; effectively, this is encoded by the exponent. So with 11-bit texture addressing you should have an effective 5-bits fractional at the full index magnitude.

Correct.

fp24 (15-bit mantissa) gives you 2^-16 relative precision (mandated by the GLSL ES specs for highp, btw);
fp16 (10-bit mantissa) gives you 2^-11 relative precision, etc.

Exophase · Sep 15, 2010

Xmas said:
That document actually says that the fragment shader uses FP16.

Wow, you're right, I read the section and somehow completely misinterpreted it as meaning that FP16 is supported and should be used to save bandwidth between vertex shading and fragment shading.

That only gives effective 11-bits of guaranteed precision, not very good for HD texture coordinates...

I wonder if maybe the ALUs are FP16, but it uses FP24 internally and can access it for some purposes. Like if the TMUs can be addressed with it and varyings produce it. I guess even FP16 isn't that bad for texture coordinates (and on SGX you'd probably usually opt for it), since it still gives you 0.25 sub-texel precision at up to 512x512. 1024x1024 if somehow the texture coordinates can range from -1 to 1 (pretty sure it can't work that way but I don't really know for sure)

So ARM promotes a lot that Mali has very efficient bandwidth utilization, even compared to "traditional tile renderers." Are they claiming that they have better post-transform geometry data compression than IMG does?

Lazy8s · Sep 16, 2010

Exception has been taken to that characterization of PowerVR as the "traditional tile renderer".

Ailuros · Sep 16, 2010

Exophase said:
So ARM promotes a lot that Mali has very efficient bandwidth utilization, even compared to "traditional tile renderers." Are they claiming that they have better post-transform geometry data compression than IMG does?

Their webpage states as of recently:

Advanced tile-based deferred rendering and local buffering of intermediate pixel states.

http://www.arm.com/products/multimedia/mali-graphics-hardware/mali-400-mp.php

While in their dev_guide I read the following:

The Mali GPUs use tile-based immediate-mode rendering.

For this type of rendering, the framebuffer is divided into tiles of size 16 by 16 pixels. The
Polygon List Builder (PLB) organizes input data from the application into polygon lists. There
is a polygon list for each tile. When a primitive covers part of a tile, an entry, called a polygon
list command, is added to the polygon list for the tile.
The pixel processor takes the polygon list for one tile and computes values for all pixels in that
tile before starting work on the next tile. Because this tile-based approach uses a fast, on-chip
tile buffer, the GPU only writes the tile buffer contents to the framebuffer in main memory at
the end of each tile. Non-tiled-based, immediate-mode renderers generally require many more
framebuffer accesses. The tile-based method therefore consumes less memory bandwidth, and
supports operations such as depth testing, blending and anti-aliasing efficiently.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0363d/index.html

Since I'm having a hard time making out a difference can I call the Mali a TBIDR (tile based immediately deferred renderer)?

JohnH · Sep 16, 2010

Imo the correct description of Mali is "Early Z Tile based rendering", this differs from the PowerVR method in that they don't do deferred shading/texturing.

Exophase - yes 15 bit mantissa is actually 16 bits of precision when you take into account the implied 1, assuming that's the representation used.

Note that FP16 is not generally sufficient precision to manipulate texture coordinates with the shader (think accumulated error). I still believe they're FP24, but I'm not sure why they wouldn't expose as that though if that really was the case.

John.

Exophase · Sep 16, 2010

Had this one out with aaronspink/darkblu/et al once.

It's deferred and not immediate in the sense that it's scene-grabbing instead of rendering primitives as they're issued.

It's not deferred and is immediate in the sense that it performs full rendering (minus early-Z elimination) within a tile as opposed to having a fast internal Z-path and then index based rendering.

An important distinction between it and z430 is that it appears to have more explicit tiling and a fixed small (16x16) tile size, and therefore probably has a hardware binning pass prior to geometry as opposed to binning with geometry with skips for already binned polygons. This probably saves a lot of bandwidth in comparison, but like IMG I imagine they employ some kind of compression on their post-transform data. PowerVR should be as "traditional" as you get, having been the only tile based renderers for years, but yeah, there are clearly holes in their statements.

What I've always wanted to see was a tile-based early-Z renderer with hardware binning that also performed some level of depth sorting. If it's going to bin per-tile anyway this can't be that much more expensive (or maybe it can, I don't really know the binning algorithms and haven't thought this through that thoroughly). That'd make early-Z much more effective, and if you also add in binning of opaque vs alpha primitives you'd get order-independent translucency too... seems like it'd bring the per-pixel savings much closer to TBDR (especially w/faster than fill early-Z) while not having highly costly alpha test and having order-independent translucency.

ARM Mali-400 MP

tangey

Arun

Unknown.

roninja

Grumpy

Lazy8s

rbaker

Lazy8s

Ailuros

Epsilon plus three

rpg.314

Rys

Graphics @ AMD

Exophase

JohnH

Exophase

Xmas

Porous

darkblu

Exophase

Lazy8s

Ailuros

Epsilon plus three

JohnH

Exophase

Similar threads