ARM Midgard Architecture

metafor · Nov 18, 2010

JohnH said:
Nonsense, if the data set being pulled through the cache is larger than the cache then there is no replacement policy that isn't going to result in thrashing of that cache.

Of course there is. Lock a segment you want by MVA or by MVA block.

Note that typically these types of caches use random or psuedo LRU replacement policies.

Typical CPU cache does. If something is to be treated as a system cache, that's usually not what you want.

If that peice of data represents a significant part of the required BW and other things don't result in it being evicted from the cache (which they will) then I'd agree with you, but generally this isn't the case.

As a matter of interrest you'll notice that ARM show the GPU with it's own L2 cache, there are very good reasons why this is a seperate cache instead of sharing the L2 with the CPU.

Sure, if both caches are simply psuedo-LRU caches, then you avoid contention and thrashing (and likely read/write latency due to large array size).

But that doesn't mean that it doesn't have its disadvantages compared to a unified, shared L2. What happens when the GPU isn't working on anything and the CPU has to crunch numbers?

darkblu · Nov 19, 2010

Exophase said:
I've read before about GMA945 being scene capturing; never heard anything on it being tile based. I assume that scanline based is being considered a type of tiling.

It's been a long time since I've really dug into the GMA950 series (even though I still have them around - I think Earth deposit of GMA9xx's will last for a long time ahead), but from what I remember, it's scene-capturing for the sake of being tile-based (or sub-frame-based, if 'tile' is too-strongly-loaded a term here), and it's tile-based to avoid falling into the same pit the original XGPU did - horrendous contention between the GPU and CPU parts in the system.

So basically, most IGP/shared memory designs are sub-frame-painting scene-captureres too, as that's the one straight way to play nicely over UMA. Yet, I'm getting circumstantial evidences of NV returning to their x-bar UMA arbitrator again in the SoC/low-end-mobile space, perhaps they did some breakthrough in CPU/GPU bus arbitration.. Or perhaps I'm just seeing things ; )

JohnH · Nov 19, 2010

metafor said:
Of course there is. Lock a segment you want by MVA or by MVA block.

Typical CPU cache does. If something is to be treated as a system cache, that's usually not what you want.

Sure, if both caches are simply psuedo-LRU caches, then you avoid contention and thrashing (and likely read/write latency due to large array size).

But that doesn't mean that it doesn't have its disadvantages compared to a unified, shared L2. What happens when the GPU isn't working on anything and the CPU has to crunch numbers?

We're just going round in circles here!

rpg.314 · Nov 19, 2010

metafor said:
Of course there is. Lock a segment you want by MVA or by MVA block.

I am not sure that in any way addresses John's point. What are you going to do when the data set you wish to lock exceeds cache size? Lock all of it and your cpu starves to death.

JohnH · Nov 19, 2010

Exophase said:
Order of fragment operations and order of the EFFECT of fragment operations are two different things. Depth-test is a fragment operation, and therefore Mali and SGX perform fragment operations in separate order.

Besides the point that both Mali and SGX actually do the depth test at the same point/order in the pipeline, this is not what defines somethings immediate modeness, it is what defines something as deferred texturing/shading.

I obviously know what IMR stands for.. I don't think there's as much of a consensus as you think, as indicated by Ailuros's previous post.

As I say, anyone fully familier with the history would understand the terms.

I think any misconceptions here have come about due to IMRs being so prevalent between 2001 and 2008 which has lead to the terms used to differentiate the technologies falling into miss use, something that is only now changing with the rise of handheld/mobile and the importance of efficient non brute force architectures.

I've read before about GMA945 being scene capturing; never heard anything on it being tile based. I assume that scanline based is being considered a type of tiling.

GMA 900 was a tile based renderer, it wasn't scan order rendering.

mboeller · Nov 21, 2010

something I found on the net about Mali:

http://www.design-reuse.com/article...ntages-of-the-mali-graphics-architecture.html

It is rather old but maybe still of value.

JohnH · Nov 22, 2010

mboeller said:
something I found on the net about Mali:

http://www.design-reuse.com/article...ntages-of-the-mali-graphics-architecture.html

It is rather old but maybe still of value.

Incorrectly shows PowerVR reading all parameters twice, when in fact we read Position only first and then only attributes for visible polygons second. Contrary ARM's claim this means PowerVR significantly reduces BW relative to Mali which appears to read everything up front irrespective of visibility. This was still true on Mali400, wonder if they've improved it on T60x.

mboeller · Nov 22, 2010

JohnH said:
Incorrectly shows PowerVR reading all parameters twice, when in fact we read Position only first and then only attributes for visible polygons second. Contrary ARM's claim this means PowerVR significantly reduces BW relative to Mali which appears to read everything up front irrespective of visibility. This was still true on Mali400, wonder if they've improved it on T60x.

Do you have any informations about the bandwidth demand of PowerVR, especially SGX in the public domain? I think a little bit more information about your architecture would help a lot to understand your advantages and disadvantages better than now, and would dispel a lot of (wrong?) rumors/informations floating around.

JohnH · Nov 25, 2010

mboeller said:
Do you have any informations about the bandwidth demand of PowerVR, especially SGX in the public domain? I think a little bit more information about your architecture would help a lot to understand your advantages and disadvantages better than now, and would dispel a lot of (wrong?) rumors/informations floating around.

Hmm, we don't seem to have anything, or at least anythign readily accessible in the public domain, which is odd...

John.

Arun · Nov 25, 2010

JohnH said:
Hmm, we don't seem to have anything, or at least anythign readily accessible in the public domain, which is odd...

I remember you had precise bandwidth numbers for VXD370 publicly available a long time ago, but evidently your marketing guys have decided bandwidth information in general isn't worth talking about much anymore, which is odd. I know it can be distorted by competitors, but if properly qualified that could just as well turn back on them.

If Rys had already been at IMG then, I'd have assumed the marketing guy's head would have exploded after being explained why it's oh-so-important to have everything in gibibits instead of gigabits, but that can't be it

(I kid, I kid!)

AlexV · Nov 25, 2010

Arun said:
If Rys had already been at IMG then, I'd have assumed the marketing guy's head would have exploded after being explained why it's oh-so-important to have everything in gibibits instead of gigabits, but that can't be it (I kid, I kid!)

Rys is into the whole gibi posh-ness?Oh my...

Arun · Nov 25, 2010

AlexV said:
Rys is into the whole gibi posh-ness?Oh my...

I seem to remember he kinda was at one point, but don't think he cares anymore - still not a bad way to tease him though if my memory serves me well

Also, we should stop this right here, for fear of having to ban each others for going OT! Not that this thread has done a very good job of staying on topic mind you...

Ailuros · Nov 25, 2010

Arun said:
Also, we should stop this right here, for fear of having to ban each others for going OT! Not that this thread has done a very good job of staying on topic mind you...

With as sparse information available for T604 it's kind of inevitable. I'm sure that most of us will keep the thread updated as more information about the new Mali architecture will appear.

I just remembered...Mali was the first graphics processor I'm aware of that sported single cycle 4xMSAA. They also have 16xMSAA available but it obviously comes at a very high fill-rate cost; however for not too complicated OpenVG stuff the latter could be worth gold.

Here I'm wondering why they haven't moved to something (yes I know how much of an exaggeration it may sound) like single cycle 8xMSAA. The T604 information doesn't lead me to believe that they changed much in their already existing algorithms.

tangey · Nov 25, 2010

I see Arm a few days ago launched Mali-300.

http://www.arm.com/about/newsroom/a...nced-graphics-a-reality-for-all-consumers.php

Now given that a lot of the graphics sessions at their recent conference, and the launch of the T-604 in particular, was heavy on how important openCL was going to be to future Soc solutions, its strange is it not that they launch a brand new IP core for licensing that is *NOT* OpenCL compliant (or if it is, someone decided to leave out that fact).

A look at the stated performance of it shows that it exactly matches the stated performance of a single core Mali-400.

So is this little more than a renamed single core Mali-400 (albeit space optimised on the basis that its not a single core variant of a multi-core archeitecture), that has been around for some time now and was never licenced ?

Simon F · Nov 26, 2010

Ailuros said:
I just remembered...Mali was the first graphics processor I'm aware of that sported single cycle 4xMSAA..

That's just another way of saying that it had 4xN processing units but couldn't use those to render 4x as a many pixels per clock.

Ailuros · Nov 29, 2010

tangey said:
I see Arm a few days ago launched Mali-300.

http://www.arm.com/about/newsroom/a...nced-graphics-a-reality-for-all-consumers.php

Now given that a lot of the graphics sessions at their recent conference, and the launch of the T-604 in particular, was heavy on how important openCL was going to be to future Soc solutions, its strange is it not that they launch a brand new IP core for licensing that is *NOT* OpenCL compliant (or if it is, someone decided to leave out that fact).

A look at the stated performance of it shows that it exactly matches the stated performance of a single core Mali-400.

So is this little more than a renamed single core Mali-400 (albeit space optimised on the basis that its not a single core variant of a multi-core archeitecture), that has been around for some time now and was never licenced ?

Or you can call it a Mali200 + L2 data cache.

Simon F said:
That's just another way of saying that it had 4xN processing units but couldn't use those to render 4x as a many pixels per clock.

I haven't been on a patent hunt for a long time; wonder if that 8xMSAA patent for VGX150 has popped up in the meantime.

arjan de lumens · Dec 8, 2010

More material on the Mali-T604:
"ARM Mali-T604 GPU Ready for Post-32-bit World and Real Computing"

T604 supports 64-bit pointers, in addition to the already-known FP64; also, atomic memory operations are implemented 'extremely efficiently'.

Lazy8s · Dec 8, 2010

Sounds quite forward looking. I wonder how much the new generation of graphics architectures will vary from one another in feature support this time around.

As the APIs have become less rigid, their implemented support within a GPU design has become more creative.

Exophase · Dec 10, 2010

I've been reading the various Mali blog posts and the ARM guys just seem delusional. But the 64-bit pointer one kinda took the cake.. here are some of the things that really stood out to me:

- 64-bit support in N64 didn't do the console a bit of good and was 100% marketing schtick that games all but ignored entirely, in fact several consoles since then have not been 64-bit and the ones that do tend to put it to better use by at least throwing SIMD on it.
- Lauds real 64-bit support in a GPU meant to accompany a CPU that has hacked on PAE for addressing 40-bit
- FP16 is all you need for fragment shading.. really? Is that why only current Mali has such a limitation?

I'm starting to find the whole FP64 push a little ludicrous as well, as if Mali T604s are going to be used in HPC all of a sudden or something..

This one is pretty good too:

http://blogs.arm.com/8-multimedia/entry-353-of-philosophy-and-when-is-a-pixel-not-a-pixel/

Whining about overdraw factors in IMG's marketing (and making the outrageous claim that overdraw ranges between 1.0 and 2.0 as if it isn't frequently well above that) and not saying a word about how it's because one has deferred shading and the other does not..

Really you would think something has got to be good about Mali to convince Samsung to trade away from an SGX540 for it but I'm starting to wonder.

Lazy8s · Dec 10, 2010

Even when Samsung's previous graphics platforms have been well received by licensees, like in the case of Apple and the first iPhone chip, Samsung always goes out of their way to source graphics IP from multiple suppliers, including from their in-house designers.

I remember seeing one of their roadmaps for application processors not long after the reveal of the S3C2460, their first MBX Lite implementation, and I was surprised to see it show, within the same document promoting the capabilities of PowerVR, that the next generation was switching to an entirely unrelated mystery graphics IP with "5 million polygons per second performance" (where MBX Lite was promoted at 1M). Later, they did launch a basically one-off app processor with a likely in-house GPU design and then switch right back to PowerVR, right before juggling licenses for Mali.

I think they're keeping an open mind; I expect they'll take a license out for a newer PowerVR and, at the same time, make an implementation of another different design.

ARM Midgard Architecture

metafor

darkblu

JohnH

rpg.314

JohnH

mboeller

JohnH

mboeller

JohnH

Arun

Unknown.

AlexV

Heteroscedasticitate

Arun

Unknown.

Ailuros

Epsilon plus three

tangey

Simon F

Tea maker

Ailuros

Epsilon plus three

arjan de lumens

Lazy8s

Exophase

Lazy8s

Similar threads