Given that the only two IP vendors left in that space seem to be ARM and PowerVR, yes. Lets hope they give them a real run for their money.Anyway lets hope they get more of a clue with the next one and give PowerVR a real run for their money.
Its not the bits you've got its what you do with them that counts
Input and storage precision are not the same as intermediate result precision. There are ways of managing numerical computation in an architecture such that you don't need to maintain a complete FP24 pipe to maintain accuracy.
Besides which Mali200 is still the only IP Core to have achieved Khronos conformance at 1080p resolution... so evidently it's not as much of a problem as people seems to think.
You'd think if SGX was capable of passing conformance at 1080p they'd have press released it by now (they press release every other bleeping thing).
I'm sure if a customer actually wanted a specific resolution tested it would be done. FWIW, of those companies who do actually have conformant OpenGL ES 2.0 systems, most seem to have opted for a "box standard" VGA resolution when doing the test. <shrug>You'd think if SGX was capable of passing conformance at 1080p they'd have press released it by now (they press release every other bleeping thing).
Well, there are a range of MBX models and SOCs that they are put into, and clearly some perform better than others. I can't comment on an individual case as I don't know their "innards" in enough detail.I worked with one of the chips implementing the original MBX and it was nowhere near the performance envolope stated in their material.
Given that the only two IP vendors left in that space seem to be ARM and PowerVR, yes. Lets hope they give them a real run for their money.
Well, there are a range of MBX models and SOCs that they are put into, and clearly some perform better than others. I can't comment on an individual case as I don't know their "innards" in enough detail.
In terms of precison and its effect of texture sampling it is the resolution of the source textures that has an impact, not the target resolution i.e. sub pixel resolution doe snot chaneg with screen size, so it is unlikely that target resolution wil have any impact of the results of the conformance tests.
John.
IMG figures are actual synthesis figures in the same way ARM's are claimed to be.Two notes of caution here :-
Its well known in the industry that ARM has a track record of conservatively estimating their core sizes, the PowerVR guys can be a little more errr, "creative" shall we say.
It is well known that not all MBX systems are alike, some clearly do hit the our performance claims which suggesst that the performance of MBX itself was exactly as stated.Similarly, don't just take it as read that the performance numbers are correct. I worked with one of the chips implementing the original MBX and it was nowhere near the performance envolope stated in their material.
Remember SGX is a unified shader - Ask yourself are they quoting SGX peak fill rate with the core 100% dedicate to fragment processing ? Similar question goes for vertex processing...
Lies, lies, damn lies and GPU marketing material and all that.
On the power consumption front there are a number of variables to take into account.
Total power efficiency for the GPU core will depend on the number of gates in the core, number and area of the RAM's in the core. How many of those are active at anyone time and (this is the key bit you've missed so far) the amount of external BW consummed by the GPU core.
Not to trivialise it though the gate/RAM area is a big issue without power gating. Sub 65nm static power consumption through leakage is a big deal, so the SGX would seem to have the edge over Mali there, however, if the utilisation of the core is 100% during a rendering phases then there is no/limited opportunity to power gate (you need to keep the gates powered up to do the work) and this is where the SGX gets let down.
SGX being a unified shader architecture its compute core is shared between vertex and fragment processing (which inccidentally is probably why its smaller). It attempts to load balance using some hoopy hyper threading system, this will likely have the effect that the core is active a lot more of the time meaning agressive power gating really may not buy you that much. Mali has the advantage that MaliGP can be completely power gated after its finished processing (thats about 30% of the architecture powered off). Thats gotta be worth something!
You seems to be making assumptions about the SGX architecture which aren't based on reality.Another factor that plays in here is the number and size of the RAM instances in the design. I don't know the in's and out's of the implementation of SGX (I haven't seen any die shots I can analyse), but to keep a hyper threaded unified shader architecture fed they probably have a big ass cache RAM to context switch in and out of to keep the thing ticking over. That's gonna cost big on the power consumption front. As long as the core is active that RAM needs to stay powered up.
You seem to be assuming that SGX has an inherent context switch overhead, again, this is simply an incorrect assumption.Mali has some neat (and patent protected) tricks up its sleave in that regard. It doesn't have any context switch overhead thanks to a nifty trick of carrying the context with each thread. This means they have little or no pipeline flush overhead and no need for a munging great cache to store it.
Last thing you need to take into account is the memory bandwith consummed by the two cores. External memory banwidth to DRAM consumes stupid amounts of power and nothing hammers the crap out of memory quite like a GPU running 3D graphics.
I attended a tech seminar (come to think of it I think it was an ARM one) were they talked about external DRAM accesses being 10x the cost of internal computation in some cases. While I'm not sure I buy 10x, even 2x would be a significant effect and reducing the bandwidth used by a GPU would make a significant difference to overall power consumption.
I've heard ARM make some pretty bold claims about Mali's BW reduction techniques. Whilst I don't have any first hand experience to confirm or deny those claims. I am told by trusted sources that they are on the level though and they do have an advantage compared to SGX with real world work loads. Enough to offset the size difference? I can't say, but interesting to note.
So whats my point? The above are just a few obvious things that I've obsereved which tell me its impossible to make an apples for apples comparison of the two based on publicly available data. We are only getting a tiny glimpse of the whole picture.
Going on experience I'd say PowerVR will over promise and under deliver on the SGX, but they'll sell a bundle of them anyway and so we'll suffer more mediocre graphics experiences on handsets for another generation. ARM are winning designs away from PowerVR however, so there must be something in this that's making sense to some big names.
As for the Mali400 MP, I think is a very poorly thought out product. If you are going to introduce a multi core scaleble product why the hell not scale both fragment and vertex shader cores. This smacks of something nailed together in a hurry to meet some spurious customer request if you ask me (wonder if thats anything to do with them loosing one of their key strategic technical people earlier in the year...).
Anyway lets hope they get more of a clue with the next one and give PowerVR a real run for their money.
Hold on a minute, how many embedded systems do you know that actually have enough room to store a 1920x1080x32 texture (8 MB for the top MIP level), let alone have the need to zoom into it by nearly 16x?????
Well I suppose, viewing JPG stills maybe with some zoom, but then you can do a partial decode on them to limit the source texture size so its not a problem.
Or post processing of HD video, but then thats not likely to need to be zoomed by 16x...
I'd like to see your use case...
Lies, lies, damn lies and GPU marketing material and all that.
And of course ARMs marketing team only dish out absolute truthful and factual information <rolls eyes>
TheArchitect said:Its well known in the industry that ARM has a track record of conservatively estimating their core sizes, the PowerVR guys can be a little more errr, "creative" shall we say.
As JohnH implied, both are pre-layout. By looking at some of the META cores, I've come to the conclusion that sometimes PowerVR will indicate a clock target and a die size, but those clocks are for speed-optimized designs and the size is for area-optimized designs. It's not a lie per-se (clocks are 'up to') of course, just overly aggressive marketing. But on the other hand, ARM (at least for CPU designs) has a tendency to more clearly associate a die size with a specific frequency target. It's not usually a massive difference, and this might not systematically be true (or it might be outdated) but it does seem noteworthy to me. Actually ARM seems to be doing the same with the Cortex-A9...JohnH said:IMG figures are actual synthesis figures in the same way ARM's are claimed to be.
I agree with John here, power gating only makes sense during long inactivity times. If your VS is 5x faster than required during part of the processing, it'll still need to be active 20% of the time and it's not viable to have an absurdly massive FIFO to let it idle for sufficiently long times. The fact MaliGP can be power gated individually makes sense and obviously can't hurt, but an unified architecture is still likely to benefit more from power gating in general. I haven't thought enough about deferred rendering in this context though to be sure if it has an impact of its own (good or bad) however.JohnH said:By being unified you expose maximum compute power to the problem at hand irrespective of being vertex or pixel bound, this increases the opportunities for idle power gating the entire core which is the granularity that most power gating schemes work at at this time.
As JohnH implied, you couldn't be any more wrong here: http://www.eetimes.com/news/design/...cleID=210003530&cid=RSSfeed_eetimes_designRSSTheArchitect said:I don't know the in's and out's of the implementation of SGX (I haven't seen any die shots I can analyse), but to keep a hyper threaded unified shader architecture fed they probably have a big ass cache RAM to context switch in and out of to keep the thing ticking over. That's gonna cost big on the power consumption front. As long as the core is active that RAM needs to stay powered up.
Uhoh, note to self: not reply to TBDR Bandwidth Holy Wars. Ever!JohnH said:Perhaps you should ellude to where you think this difference comes from, becuase I'm pretty certain that equivalent SGX consumes less BW than Mali in every instance.
Now THAT's being opinionated!TheArchitect said:Going on experience I'd say PowerVR will over promise and under deliver on the SGX, but they'll sell a bundle of them anyway and so we'll suffer more mediocre graphics experiences on handsets for another generation.
Why must we expect every licensee to be rational, and why must we expect performance, die size, and power consumption to be the only factors? I'm not saying this to diminish either ARM or PowerVR; however my point is the only thing this tells us is the difference isn't so massive that the choice is always clear-cut for potential licensees in the real world. You would expect that to be the case anyway for the surviving players in an open market...TheArchitect said:ARM are winning designs away from PowerVR however, so there must be something in this that's making sense to some big names.
I'd be interested in hearing of any other benchmark with published figures.You could have picked a better benchmark to illustrate your point. GLBenchmark is soooooo bad.
First of all, welcome to the forum TheArchitect, enjoy your stay!
I agree with John here, power gating only makes sense during long inactivity times. If your VS is 5x faster than required during part of the processing, it'll still need to be active 20% of the time and it's not viable to have an absurdly massive FIFO to let it idle for sufficiently long times.
As JohnH implied, you couldn't be any more wrong here: http://www.eetimes.com/news/design/...cleID=210003530&cid=RSSfeed_eetimes_designRSS
http://i.cmpnet.com/eet/news/08/07/1538UTH_1.gif
SGX is the core in the top right. It's very clear that it has incredibly little SRAM; it's nearly pure logic. At the right, based on a SRAM cell size of ~0.5, there seems to be 64-80KiB of SRAM. On the left, at the bottom and maybe in the center, there's also a very little amount of extra SRAM. That's more than enough for texture caches, FIFOs, and register files.
Given that SGX 530 has two "shader cores" presumably, you could assume that the top right and bottom right SRAM is the shader pipe-specific stuff (incl. RF) and the texture caches, while the center right is for the FIFOs. The rest are misc. buffers, for example to communicate with the outside world.
Compared to a non-deferred renderer, they can also save quite a bit of SRAM by not needing on-chip HierZ and stuff like that; and obviously the memory controller is off-block.
Uhoh, note to self: not reply to TBDR Bandwidth Holy Wars. Ever!
Now THAT's being opinionated!
Well actually I was suggesting that *all* GPU marketing stretches the truth, but hey hoo.
...and we'll have an immediate agreement.Well actually I was suggesting that *all* marketing stretches the truth...
I'm hearing the same story since the birth of 3D in the mobile/PDA market. In fact it we should have seen some fierce competition after MBX from ATI (now AMD) which abandoned the market with flying colours, the Bitboys (absorbed by ATI before AMD bought the latter and the result lies in the former sentence...), NVIDIA (which seems to do a lot better with APX2500 than with the initial GoForce) and Falanx (absorbed by ARM) etc.Anyway lets hope they get more of a clue with the next one and give PowerVR a real run for their money.
Make that:
If it's really who operates with "smokes and mirrors" then I'd like to hear which of them all is innocent for one, which cannibalize prices in order to gain even one deal or which of them give their IP away for free to even claim a deal after all.
Before anyone throws any stones at IMG, I'd like to hear the entire rotten story that stages itself behind the curtains including all fud like that tiling stinks (which obviously doesn't come from ARM).
Arrrrh! Now there be some tales to tell... but perhaps for another thread?
Oooo yeah thats a good one and goes waaaaaaaaay back to the mid 90's when you couldn't move in the industry without tripping over a 3D Graphic Chip company! Remember Renditions Verite, the Cirus Logic Laguna, the NV1? The scandal around the DX1 Tunnel test (I seem to remember that was the genesis of Toms Hardware). Damn that makes me feel old!
Like I said, perhaps another thread is required...
I figured that, no problem - I rather meant in the context of TBDR vs IMR vs [...] debates, which tend to have rather bold and highly contradictory arguments on both sides when it comes to things like bandwidth and impact of newer APIs. Honestly, 99% of the arguments I've seen personally proved little but the lack of understanding of the other side of the aisle - that doesn't mean there isn't a real winner (I have no idea), but if there is then the arguments probably aren't (only?) those made oh-so-often.BTW - There is no holy war here, I have no allignment to either company, just thought I'd make that clear.
You are correct, I obviously knew that for SGX but had a brainfart for Mali. Regarding your question, Arjan (who made a quick post earlier in the thread ) is a Falanx/ARM engineer. I am sadly not aware of anyone from Vivante or Matrox, although who knows who's lurking out there! (*cough* you *cough*)I think you are assuming that the VS and FS cores are decoupled by a fifo correct? In actual fact this is not the case for either architecture (I think and I'm sure JohnH will be very quick to correct me if I'm not... btw does anyone from ARM, Vivante or Matrox follow this forum???)
Yes, that is perfectly correct, and so obviously in Mali's specific case you could use power gating for the geometry processing. In the case of NVIDIA, they don't do binning/tiling of any sort so they obviously couldn't do that (at least not for the whole unit - the APX 2500's version is relatively high-end and can be scaled down downwards 2x or possibly more so maybe they could power gate half of it all the time if VS requirements aren't veryhigh); I tend to confuse Tegra's 3D core and Mali on a few things, heh...but the intermediate data between the VS and FS processing stages is actually stored to main memory (post VS, post binning). Therefore it would actually be possible and even reasonable to power off the MaliGP (this was the same with MBX equiped with VGP, but I'm not sure you could power gate it in the same way).
It's not but they say it's 5.5mm² while the total chip is 60mm²; based on that it is very easy to see what it is... (or at least what TechInsights thinks it is, but it makes perfect sense from a size POV at least given PowerVR's claims)Hmmm the legend at the bottom of the graphic is not clear about which components are which.
I don't care how they refer to them; they are cores. And when I say cores, I mean real cores, not the marketing hyperbole like NVIDIA does it. SGX's shader 'pipes' are full-blown VLIW FP32 processors with 16 concurrent threads (and 4 being prepared pre-ALU at the same time, in the same stages; I will let it as an exercise to the reader to figure out why this saves valuable register file die space and power; it's really not any different from PC GPUs, but very different from Larrabee which however benefits more from its L1 cache...)I think PowerVR refers to SGX530 as being two shader "pipes" rather than cores. The premise being that you can share infrastructure between the two pipes and save area (common in programmable architectures) rather than stamping down two identical cores.
Of course it is, it was just funny because I was making the extra effort to be as objective as possible here and not voice clear opinions, while you come along and decide to say things like that - it's funny, that's all!Is that not allow?