ARM Mali-400 MP

Indeed. That's why they aren't used (directly) as textures!

Yes your are right you can't do random access into the fully compressed stream. You'd have to do the initial stages (huffman decode, etc.) to get the block information and then only do the full IDCT, etc. on the block you're interested in though, but that should be pretty quick I would have thought.

I think I'm right in saying that there has been provision in the spec for the JPG file format from the begining for you to insert "markers" or "bookmarks" to subset the image (kinda region of interest or print optimisation thing - hang over from when it took an age to decode jpg's). Would make it less efficient on the compression side, but much easier to do this sort of trick. Not sure it was ever that popular though.

Wonder what you'd do with JPG2000? Theoretically you can partial decode a wavelet based image to a lesser resolution by leaving out different frequency components... if the image is moving fast you'd be less like to see low (or is that high) frequency components anyway.

I suppose you could always use the progressive image techniques they used in the early days of the web to reference a lower effective resolution image when the image is moving as well (is that part of the JPG spec? or something proprietary?).

Probably over think this a bit - most of these things seem to be achieved through brute force mores the pity. <Sigh> Wheres the ingenuity you used to see in the old days...
 
Yes your are right you can't do random access into the fully compressed stream. You'd have to do the initial stages (huffman decode, etc.) to get the block information and then only do the full IDCT, etc. on the block you're interested in though, but that should be pretty quick I would have thought.

I think I'm right in saying that there has been provision in the spec for the JPG file format from the begining for you to insert "markers" or "bookmarks" to subset the image (kinda region of interest or print optimisation thing - hang over from when it took an age to decode jpg's). Would make it less efficient on the compression side, but much easier to do this sort of trick. Not sure it was ever that popular though.

Wonder what you'd do with JPG2000? Theoretically you can partial decode a wavelet based image to a lesser resolution by leaving out different frequency components... if the image is moving fast you'd be less like to see low (or is that high) frequency components anyway.

I suppose you could always use the progressive image techniques they used in the early days of the web to reference a lower effective resolution image when the image is moving as well (is that part of the JPG spec? or something proprietary?).

Probably over think this a bit - most of these things seem to be achieved through brute force mores the pity. <Sigh> Wheres the ingenuity you used to see in the old days...


The DC coefficients in the block are predicted based on the previous blocks DC coefficient. So you will need to zip through the previous blocks in the frame to figure out your DC coefficient.

Yes jpeg has restart markers which allow you to start decoding from a point part way through the stream. These are primarily for error resiliance though rather than region of interest.

The progressive jpeg is also part of the standard, basically the order that coefficients were transmitted is different so that you get the DC and low frequency coefficients first, then you 'transmit' the higher frequency coefficients which gives you your detail back.


I am not sure that modern way of doing things is necesarily 'brute force'. It is really making better use of available resources. When you have a big chunk of memory, why not use it to store the decoded image rather than re-decode it every time?

CC
 
I am not sure that modern way of doing things is necesarily 'brute force'. It is really making better use of available resources. When you have a big chunk of memory, why not use it to store the decoded image rather than re-decode it every time?

CC

Computation and running from local cached storage is lower power cost than accessing a large area of memory from external DRAM.

Accessing a compressed image and partial decoding would theoretically require less trips to DRAM and therefore cost less power (depends on your decoder I guess).

But hey, you might be right, it might not be worth the effort.
 
Computation and running from local cached storage is lower power cost than accessing a large area of memory from external DRAM.

Accessing a compressed image and partial decoding would theoretically require less trips to DRAM and therefore cost less power (depends on your decoder I guess).

But hey, you might be right, it might not be worth the effort.

The partial decode solution results in multiple decodes and multipel data copies, a large number of these transactions are unlikely to be to local storage. Basically I think you get into swings and roundabouts territory when it comes to BW and the best approach would depend on the specific application.

Its also worth pointing out that the image captured from the local camera is highly likely to be in the form of a high res texture.

Wrt to screen size or more specifically the resolution of panels in hand held devices I recomend a bit of research into where things are heading.

And, of course these cores are also being used in STB's, where 1920x1080p is the target resolution, so you won't be viewing a heavily zoomed out image.

Whichever way you look at it FP24 is borderline going forward imo.

John.
 
...

But you could answer a quick OT question to a layman: given the nature of the hardware did they have an alternative choice?

I think the simple answer is that there are plenty of ways to skin a cat! You're probably right that LRB isn't really relevent as you would end up with a completly new architecture if you come at it from the low end, in my opinion you probably end up with something that looks like SGX.

When I first heard of the original Mali I said back then that if a small IP company can integrate single cycle 4xMSAA then it's high time we see it in standalone GPUs too and we did in 2006, albeit many before that considered that it doesn't make "sense".

Two things for the above apart from the long list of obvious misconceptions: when it takes 4 cycles for 16xMSAA someone might also enable 4x Supersampling instead. Before anyone says that 16xMSAA delivers far better polygon edge/intersections AA than 4xSSAA (and be of course right), I'll say that with the latter you usually get also a -1.0 LOD offset which is near 2xAF quality.

The 2nd thing is something that might be neglected by most if all not all IHVs in this market: albeit I understand the importance of having 2x,4x or more Multisampling on devices with small screens, it does sound to me that none of them has the power to add at least some portion of anisotropic filtering.

In my head Multisampling compared to Supersampling is a sort of "performance optimization", yet the first should also come with at least 2xAF to be comparable. And yes I understand that an advanced analytical AF algorithm along with the required TMU strength is a tough cookie to break for now when die area is so limited.

To me though it takes a bit more than good polygon edge/intersection antialiasing (which is what nowadays 5-10% of the total screen space?) and blurry bilinear filtering for the majority of the screen.

The way I look at is that both MSAA and AF are optimisations of FSAA, MSAA localises the cost of FSAA to edges of geomtery only and AF localises the cost of super sampling textures only to those surfaces that need it. Combining the two gives very good results with lower cost than FSAA.

John.
 
I'm not sure how you'd do it on a PNG file, but probably taking a texture sub image which again would limit the section your accessing when zoomed to a "manageable precision" (I know Apple desktop solution providers have to supply a number of required extensions, perhaps an extension attaching a sub region of a texture to a texture is one of them).
i'm not sure i'm following with the extension - glTexSubImage has been a part of the gl core spec for generations now.

other than that, if you play with an iphone long enough you'll notice two things - the average size of kinetically pannable surfaces is impressive, particularly for such a class of device, and the ulta-responsive scroll-ability, stretchability (pinch-zoom) and rotatability aspects put an extra stress on the dimensions and response times for working with surfaces. for me, presonally, iphone's GUI is the most impressive mobile GPU showcase anybody has been able to come up with till now. i'm nothing short of amazed at the things the little mbx (lite) is able to pull, and had apple exposed the VGP's shading extension it'd have been my dream handheld.

why i'm bringing up all that - because a whole generation of mobile graphics devs will expect nothing less from the next generation of iphone (and handhelds, in general) - so those better have all the niceties that lowly mbx has: 2k x 2k textures, 2k x 1k viewports, 24bit depth buffers.
 
I think the simple answer is that there are plenty of ways to skin a cat! You're probably right that LRB isn't really relevent as you would end up with a completly new architecture if you come at it from the low end, in my opinion you probably end up with something that looks like SGX.

I meant that if Intel left LRB as it is on the hardware side of things and the driver would rather go into the IMR direction that the overall logic budget would had been a lot higher than with sw tbdr. In my mind (and I could be wrong of course) the way they've designed their hardware their current drivers were the only sensible option.


The way I look at is that both MSAA and AF are optimisations of FSAA, MSAA localises the cost of FSAA to edges of geomtery only and AF localises the cost of super sampling textures only to those surfaces that need it. Combining the two gives very good results with lower cost than FSAA.

Agreed. I'd still love to see really usable AF even on small form factor devices. If you'd give me the weird dilemma between having MSAA or AF I'd immediately go for the latter.
 
Yes your are right you can't do random access into the fully compressed stream. You'd have to do the initial stages (huffman decode, etc.) to get the block information and then only do the full IDCT, etc. on the block you're interested in though, but that should be pretty quick I would have thought.
The comments Capt. CP made make it rather unpleasant. The closest you'll get in the literature to JPEG-like textures, AFAIK, was Microsoft's TREC that was in the Talisman system. The problem is that needs indirection which can result in GBH if you mention it to a hardware engineer. Please see the first two sections of this fool's paper.
 
i'm not sure i'm following with the extension - glTexSubImage has been a part of the gl core spec for generations now.

You are quite right it has, the form addressing 3D texture is an extension in ES 2.0. I'm getting old and my brains going. :cry:

other than that, if you play with an iphone long enough you'll notice two things - the average size of kinetically pannable surfaces is impressive, particularly for such a class of device, and the ulta-responsive scroll-ability, stretchability (pinch-zoom) and rotatability aspects put an extra stress on the dimensions and response times for working with surfaces. for me, presonally, iphone's GUI is the most impressive mobile GPU showcase anybody has been able to come up with till now.

No arguments from me about how nice the IPhone UI is, glorious piece of work. To anyone selling Gfx IP into this space Iphone must have been a god send, it was all looking a bit ropey before that as no-one had really pushed gaming or even the UI experience in commercial devices.

i'm nothing short of amazed at the things the little mbx (lite) is able to pull, and had apple exposed the VGP's shading extension it'd have been my dream handheld.

Two possible reasons for that VGP extension being ommited are Apple not wanting reliance on single vendor extensions (paranoia about lock in) or I think I'm right in saying they did their own driver and that type of extension is quite a big chunk of work (implementation and verification, etc.), so may have been left out to reduce the workload.

However the discussion point was, "is FP24 enough representation to work with HD/2Kx2K textures or not?".

For the use cases described its sufficient (I don't think there was any disagreement was there?). For some cases it might be a bit of a stretch, but for screen aligned content (that is anything planar to the screen) its perfectly workable (probably a bit more of an issue if you start projecting the image into the Z).

John's point about mobile and HD is valid as well, more and more handset vendors are talking HD out from handsets (that is via HDMI, wireless USB/BT connections and built in micro projectors in the future), MID's increase the possibility of larger built in resolution screens, although there is some limits to the usefullness of going too mad with res within the physical constraints of the devices, (the physical ability of your eyeball to differentiate beyond a certain DPI at close range would render the investment in more pixels a bit pointless - Nokia have an R&D white paper on that some place).

why i'm bringing up all that - because a whole generation of mobile graphics devs will expect nothing less from the next generation of iphone (and handhelds, in general) - so those better have all the niceties that lowly mbx has: 2k x 2k textures, 2k x 1k viewports, 24bit depth buffers.

I think I'm right in saying most of the current crop of IP already meets or exceeds this specification (the dark horses would be AMD/ATI and Nvidia). The problem going further with precsision etc. all comes at a cost in gate area, storage, power etc.

Whilst Iphone is a good example of the use of a GPU in a device like that and probably the first time graphics has played a significant role in the purchasing decision of the consumer it's still quite a way from being the primary factor for most people. Packing an ever more powerful GPU with more cost (in power, area and $$$ terms) has to meet some ROI criteria for the handset implementors before they'll deploy it. Although a lot of handsets are shipping with acceleration its a long way from being ubiquitous at present and many maybe happy to stick with what they have for the mainstay of the market.
 
...
However the discussion point was, "is FP24 enough representation to work with HD/2Kx2K textures or not?".

For the use cases described its sufficient (I don't think there was any disagreement was there?). For some cases it might be a bit of a stretch, but for screen aligned content (that is anything planar to the screen) its perfectly workable (probably a bit more of an issue if you start projecting the image into the Z).
It becomes marginal as soon as you start doing more interresting stuff, and it simply isn't future proof, where "future" probably lies within the life span of the current generation, particlarly when you throw OpenCL into the mix.

...
I think I'm right in saying most of the current crop of IP already meets or exceeds this specification (the dark horses would be AMD/ATI and Nvidia). The problem going further with precsision etc. all comes at a cost in gate area, storage, power etc.
I beleive some members of the current crop are still limited to 16 bit Z.

When you amortise the cost of FP32 across vertex and pixel processing it isn't as significant a cost as you would think, but then thats another benifit of a unified shading engine.

Whilst Iphone is a good example of the use of a GPU in a device like that and probably the first time graphics has played a significant role in the purchasing decision of the consumer it's still quite a way from being the primary factor for most people. Packing an ever more powerful GPU with more cost (in power, area and $$$ terms) has to meet some ROI criteria for the handset implementors before they'll deploy it. Although a lot of handsets are shipping with acceleration its a long way from being ubiquitous at present and many maybe happy to stick with what they have for the mainstay of the market.

I wouldn't underestimate how big a seller eye candy is. Apple have already demonstrated this and have set the benchmark for user experience and I suspect that they're far finished, those that don't follow may struggle to survive in my opinion.

John.
 
It becomes marginal as soon as you start doing more interresting stuff, and it simply isn't future proof, where "future" probably lies within the life span of the current generation, particlarly when you throw OpenCL into the mix.

I guess we'll have to wait and see on the CL front, not enough detail around to make that call at present, but thats probably true.

I beleive some members of the current crop are still limited to 16 bit Z.

Really now that is interesting, probably not ES 2.0 though right?

When you amortise the cost of FP32 across vertex and pixel processing it isn't as significant a cost as you would think, but then thats another benifit of a unified shading engine.

It's still a cost on something that is already pretty big in terms of area versus the percieved value from a business point of view. Thats what I was getting at.

We haven't got to the point where Phil Taylor can say - "you don't need a big CPU in mobile, just a big GPU" ;)

I wouldn't underestimate how big a seller eye candy is. Apple have already demonstrated this and have set the benchmark for user experience and I suspect that they're far finished, those that don't follow may struggle to survive in my opinion.

The eye candy has been made a factor by Apple thats for sure, until IPhone everything was a little bland. However its still not the primary focus for most purchasers. Remember IPhone, N95 and freinds (the high end HTC's, LG Prada etc.) are the tip of the iceberg in terms of total handset sales and revenue. Most people are still more wrapped up in "can I get it free on my contract?" or "yeah I use crackberry(tm) 'cos its standard issue at my firm".

Outside the US, IPhone (and similar devices) are also hampered by not having an "all you can eat" data plan (don't get me started on how much of a rip EU data provision is!)

I think the IPhone is still sold more on the asperational qualities of the device than technical merit (although what it does it does very very well, but then Apple have always been very adept at making the complex very simple).

To get back to the point I was originally going to make in the last post (I forgot what I was going to write, then remembered after I sent it!). What is likely to drive the acceptance of more GPU grunt (more share of the PPA budget) is the heterogeneous compute model. This will grow the ROI for the GPU by enabling other "Visual Computing" applications to run successfully across the system (CPU,GPU, DSP, VLIW engines, etc.). To work well though, this will have to be a close partnership between GPU, CPU and DSP which will be hard to manage as a provider of a single peice of the puzzle (there is a whole ton of system crap you need to make that run well as well).
 
I think the IPhone is still sold more on the asperational qualities of the device than technical merit (although what it does it does very very well, but then Apple have always been very adept at making the complex very simple).
Excuse the OT but I was actually almost sold in getting an iPhone. The reason I pulled back is that it doesn't support my native language (greek) up to now and you can't send any MMS (yet?) either. Relevant updates are supposed to come soon, but I'll get the device only if some of the essential functionalities that I need will be included in the package.

I hope Apple hasn't made the same mistake in huge markets like the chinese market; granted Greece is only a miniscule market compared to that, but when you advertise the iPhone as much as it has been here (in fact I've never seen up to now such an agressive marketing campaign) you can't come along with such serious drawbacks. As it stands right now if I'd get it, it would be half way useless as a mobile phone.
 
Excuse the OT but I was actually almost sold in getting an iPhone. The reason I pulled back is that it doesn't support my native language (Greek) up to now and you can't send any MMS (yet?) either. Relevant updates are supposed to come soon, but I'll get the device only if some of the essential functionalities that I need will be included in the package.

Bit of an odd one that (the Greek language omission), you'd have thought they'd have had most of the essential dialogues translated already.

No MMS support is a very poor show I have to agree, it's a bankable feature on pretty much every phone these days.

I hope Apple hasn't made the same mistake in huge markets like the Chinese market

Ah but then they get all those fun clones over there to fill the gaps - I love all of that stuff on Engadget :LOL:

granted Greece is only a minuscule market compared to that, but when you advertise the iPhone as much as it has been here (in fact I've never seen up to now such an aggressive marketing campaign) you can't come along with such serious drawbacks. As it stands right now if I'd get it, it would be half way useless as a mobile phone.

The lack of aggressive marketing from others I think shows that the mainstay of the market is still focused on the "free with contract" handset.

Musing on this Apple might be playing a clever game and using early adopters and the closed shop of service providers as kind of a large scale beta test to hone the product before they make the push for the mainstream (IPhone nano anyone?)

Still though, in the grand cell phone scheme of things these devices represent a limited portion of the market at present and penetration into the wider market has a price sensitivity. Originally the hope was that operator average revenue per user (known as ARPU) would be driven by selling additional services and this would help sub the costs of the handset. Increasingly though we've seen the service providers marginalised partly through there own failure to execute and partly through "unrealistic revenue expectations" (a polite way of saying greed) and the big players (Nokia, Google, Apple, etc.) supplying value added service direct to customer. If operator services continue to be commoditised (voice/SMS is already there, with the aggregation of broadband and mobile data, thats probably not far behind either) and they don't get a value add play then they can't finance the subs and growth will be limited outside of those who can afford it. Thats not an attractive thought in the current climate...

Okay now I'm depressed!
 
I guess we'll have to wait and see on the CL front, not enough detail around to make that call at present, but thats probably true.

The OpenCL spec is pretty much there so there's plenty of detail available now..

Really now that is interesting, probably not ES 2.0 though right?
I believe so.

It's still a cost on something that is already pretty big in terms of area versus the percieved value from a business point of view. Thats what I was getting at.

We haven't got to the point where Phil Taylor can say - "you don't need a big CPU in mobile, just a big GPU" ;)

A (very) small delta that future proofs your core sounds like a good deal to me.

The eye candy has been made a factor by Apple thats for sure, until IPhone everything was a little bland. However its still not the primary focus for most purchasers. Remember IPhone, N95 and freinds (the high end HTC's, LG Prada etc.) are the tip of the iceberg in terms of total handset sales and revenue. Most people are still more wrapped up in "can I get it free on my contract?" or "yeah I use crackberry(tm) 'cos its standard issue at my firm".
It is true that the bulk of units are what comes for free but we will see ripple down, and that asside its innovative products like the iPhone that will push people to want and expect more.

Outside the US, IPhone (and similar devices) are also hampered by not having an "all you can eat" data plan (don't get me started on how much of a rip EU data provision is!)

I think the IPhone is still sold more on the asperational qualities of the device than technical merit (although what it does it does very very well, but then Apple have always been very adept at making the complex very simple).
Hmm, I know a lot of poeple who own iPhones becuase of its fluidy of execution as apposed to the asperational factor, but hey I haven't seen any marketing data on who's been buying the thing.
To get back to the point I was originally going to make in the last post (I forgot what I was going to write, then remembered after I sent it!). What is likely to drive the acceptance of more GPU grunt (more share of the PPA budget) is the heterogeneous compute model. This will grow the ROI for the GPU by enabling other "Visual Computing" applications to run successfully across the system (CPU,GPU, DSP, VLIW engines, etc.). To work well though, this will have to be a close partnership between GPU, CPU and DSP which will be hard to manage as a provider of a single peice of the puzzle (there is a whole ton of system crap you need to make that run well as well).

Lol, glad to see you're still towing the ARM company line even though you don't work for them now. Not withstanding the fact that IMG already works very closely with its key partners, initiatives like OpenCL will address the base issue of portability of key algorithms. Further, there has already been some interresting work by 3rd parties with NV's Cuda to do load balancing across CPU and GPU. If anything I think this is a time where the likes of ARM probably need to be looking over their shoulder.

John.
 
A (very) small delta that future proofs your core sounds like a good deal to me.

On the FP32 thing? Maybe, but I wasn't just talking about that, the statement I was making is more about a general market issues not that specific feature. The mobile graphics community seems to be on a "go large" kick (to a certain extent survival depends on it), I'm just expressing a view that it may not be a sustainable business model to keep going bigger and bigger. At some point OEM's may call a halt in an attempt to consolidate on wringing out the power from the tech they have now.

It is true that the bulk of units are what comes for free but we will see ripple down, and that asside its innovative products like the iPhone that will push people to want and expect more.

I used to buy that, but I'm not so sure now, it's taking much longer for HW graphics to filter down and I think part of that is associated cost. 2420 has been in the market for an age for a mobile chip, you would have expected the price to start rolling off and the trickle down to have begun already, but no signs yet.

Hmm, I know a lot of poeple who own iPhones becuase of its fluidy of execution as apposed to the asperational factor, but hey I haven't seen any marketing data on who's been buying the thing.

We probably move in different circle to the mass market, most of the people we know are more "informed" consumers or have been influenced by one, so probably not entirely representative sample. As I pointed out in the last post though, no argument that the IPhone is a well executed user experience and others handset manufacturers should take note.

I was actually looking for some market data myself last night and came across an article claiming that distributors had been gagged by Apple with regards to sales figures etc. Not sure how reliable that is, but I bet that information would be gold.

Lol, glad to see you're still towing the ARM company line even though you don't work for them now.

Is this ARM's line? From what I've seen I don't think they've worked out that it might be an idea to have a more integrated story on this stuff! Yes they own all the right bits, but thats only half the story. Still very much a CPU centric mind set there. You don't see the levels of activity you see from AMD, IBM and more recently Intel in bluring the line and moving to the heterogeneous model.

I'm of the mind that if I'm presented with a system full of programmable units and they are not fully occupied then I should be allowed to use them to speed up something else. I don't give a crap if its CPU, GPU, DSP or whatever as long as I get to use it to innovate and differentiate my product (as long as I don't have to learn yet another instruction set, programming model or tool kit).

Not withstanding the fact that IMG already works very closely with its key partners, initiatives like OpenCL will address the base issue of portability of key algorithms.

Couldn't agree more, OpenCL addressing the portability issue is a huge leap forward. I've heard some concern about the ability of non CPU compute units (GPU's desktop and embedded, DSP's etc.) to handle larger and more complex code sections, rather than performance hot spots, which to be honest is fine this is V1.0 and its designed to fit todays hardware, but that needs to be looked at going forward. Then there is also the issue of system level data management/marshalling and movement.

Poor implementation of that side of things will kill performance and hamper the up take (or at least reduce the scope of usefulness) this is where owning more of the system level should be a strength. The infrastructure management is always the critical bit in stitching together a system with IP from multiple vendors (particularly when they may be competing for more of the solution) and is often the most painful bit to get right. There is low compultion from an IP providers point of view to sink lots of man hours into this stuff as it has a low overall return for the business (versus say putting those guys on the embedded compiler for the devices) unless they own more of the overall IP or key system level enabling components.

Further, there has already been some interresting work by 3rd parties with NV's Cuda to do load balancing across CPU and GPU.

I've seen a lot of demos where they record big numbers, but nothing in a real running system deployment as yet. A lot of the claim about running physics and AI code on the GPU through CUDA turns out to be marketing fluff when you dig into it with the ISV's. I'm not saying it's a bad idea, just that your application hotspot needs to fit the current system level limitations to see a net return. Has Khronos made any statements about OpenCL and auto or directed load balancing yet?

(I did like the Myth Busters paintball demo at Nvision - wonder how much that cost them? - yes I know that wasn't CUDA, but was just a nice bit of marketing, which they do very well).

If anything I think this is a time where the likes of ARM probably need to be looking over their shoulder.

They are on a defensive on several fronts, not least of which is against Intel (its like watch the Croc Hunter when hes poking a stick up an Anacondas bum you sit there thinking "any minute now and wham!"). They still haven't completely killed off MIPS, PowerPC, Tensilica or ARC either. Maybe one of those will turn out to be a Stingray with an attitude?

Having said that though the snake did bite itself in the arse at their last IDF (I'd ask for a refund on that guys PR training - struth!).

That got me thinking actually - (not the stick up a big snakes bum bit) If you've got your largest partners off doing "rolling your own" versions of ARM cores, doesn't the value of ARM diminish to them? Does it get reduced to the point where the only thing people are buying is the ISA and the tools?

Anyway it always pays to keep looking over your shoulder no mater who you are! Only the paranoid survive and all that... (or is it just because you're paranoid doesn't mean they aren't out to get you).
 
I'm of the mind that if I'm presented with a system full of programmable units and they are not fully occupied then I should be allowed to use them to speed up something else. I don't give a crap if its CPU, GPU, DSP or whatever as long as I get to use it to innovate and differentiate my product (as long as I don't have to learn yet another instruction set, programming model or tool kit).

Can't disagree here; the only other thing I'd like to add is that for now any change in that direction sounds to me quite easier on a SystemOnChip than on a PC as we know it today. At least in the first case the CPU isn't sitting on the wrong side of the bus.

Of course the immediate answer to the latter will be "ideas" like Fusion and the likes and albeit it goes a couple of years down the road I severely doubt that they'll be able to capture anything above the budget to mainstream PC market after all.

Besides in the less foreseeable future I'd rather suspect - as seen several times in the past - that when programmability advances to a certain point the circle closes and there's usually some drift back to fixed function hardware under different terms each time. In my mind it's a perpetuum mobile as demands constantly rise especially for graphics. There's no such thing as we've reached good enough IQ and we can now easily concentrate on everything else. In 1998 most played in 800*600-1024*768 with 16bpp and bilinear. Now only 10 years later you'd expect from a high end system to give you equivalent performance with at least 4xMSAA, 16xAF, fp HDR in 1680*1050-1900*1200 at 32bpp.
 
I'm of the mind that if I'm presented with a system full of programmable units and they are not fully occupied then I should be allowed to use them to speed up something else. I don't give a crap if its CPU, GPU, DSP or whatever as long as I get to use it to innovate and differentiate my product (as long as I don't have to learn yet another instruction set, programming model or tool kit).

That's why every company I've worked for tries extremely hard to keep the innards of different blocks in their chip obfuscated to the customer. Not because exposing them would be too educational for competitors, by the time they'd learn about it, it'd be too late anyway, but to protect the customer from their own dumb ideas and because the support issues would be a nightmare, if only because all complex chips on the market have a ton of bugs, with often very peculiar SW work-arounds.

Pretty much all complex chips, no matter which application, have hidden ARM's, MIPS'en, Sparc's, Tensilica's or other C programmable CPU's that are carefully locked down before being shipped to customers. In theory, they could be used to actually calculate something useful, but it's just not worth exposing them.
 
Can't disagree here; the only other thing I'd like to add is that for now any change in that direction sounds to me quite easier on a SystemOnChip than on a PC as we know it today. At least in the first case the CPU isn't sitting on the wrong side of the bus.

I'd agree. Achieving this in a SoC is relatively easy when compared to a modern PC, but internal blocks in a SoC still have issues with competition for the miniscule bandwidth you get in emdedded systems which is often compounded by poorly implemented schemes of access yeilding low utilisation of the bus.

Besides in the less foreseeable future I'd rather suspect - as seen several times in the past - that when programmability advances to a certain point the circle closes and there's usually some drift back to fixed function hardware under different terms each time. In my mind it's a perpetuum mobile as demands constantly rise especially for graphics. There's no such thing as we've reached good enough IQ and we can now easily concentrate on everything else. In 1998 most played in 800*600-1024*768 with 16bpp and bilinear. Now only 10 years later you'd expect from a high end system to give you equivalent performance with at least 4xMSAA, 16xAF, fp HDR in 1680*1050-1900*1200 at 32bpp.

I'm with you for the desktop, but the physical constraints of a pocketable device (even when using the "docked" model - i.e. mobile device "plugged in" to a home dock with mains power, a hardline to ethernet and HDMI connections) make chasing these things considerably more problematic and cannot be solved by just 'roiding up each of the subsystems generation on generation.

The high end system of today has 1KW PSU massive amounts of active cooling and scant regard for its carbon footprint other than to prevent is power rails going molten under load. You can't pull the same trick in a mobile in the near future and this is why I'm suggesting its better to get more from whats there already.
 
I'd agree. Achieving this in a SoC is relatively easy when compared to a modern PC, but internal blocks in a SoC still have issues with competition for the miniscule bandwidth you get in emdedded systems which is often compounded by poorly implemented schemes of access yeilding low utilisation of the bus.

If it makes sense to fuse CP and GPU capabilities into single chips in the desktop space, I don't see why it wouldn't make sense for the smaller markets either. I'd think that such sollutions could capture up to the mainstream segment of the mobile market and while I don't think it would solve all possible problems, but at least for central processing and graphics you shouldn't have two units fighting for bandwidth since I'd expect the device itself to balance out/control the whole process.


I'm with you for the desktop, but the physical constraints of a pocketable device (even when using the "docked" model - i.e. mobile device "plugged in" to a home dock with mains power, a hardline to ethernet and HDMI connections) make chasing these things considerably more problematic and cannot be solved by just 'roiding up each of the subsystems generation on generation.

The high end system of today has 1KW PSU massive amounts of active cooling and scant regard for its carbon footprint other than to prevent is power rails going molten under load. You can't pull the same trick in a mobile in the near future and this is why I'm suggesting its better to get more from whats there already.

I'd say that the biggest problem mobile GPUs have with is advanced texture filtering at the moment. Other than that they're scaling in IQ improving features faster than the desktop GPUs ever did. There's 4xRGMS available both on Mali as on SGX and both can utilize up to 16x sample AA for OpenVG (where performance isn't as much an issue and that amount of samples is actually necessary) if needed.

Anisotropic algorithms have the huge advantage nowadays to be highly adaptive; they filter only the surfaces that actually need X amount of samples, and while antialiasing mostly needs bandwidth (a non issue on TBDRs and the reason why NVIDIA has implemented coverage sampling on its APX2500) anisotropic on the other hand mostly needs fillrate. Today's high end GPUs might have massive fill-rates yet all of them don't even bother to do full trilinear with AF on default; GeForces wouldn't have such a big problem with it due to their massive bilerp rates, but Radeons might have it slightly tougher there if they wouldn't use that amalgalm between bi- and trilinear often referred to as "brilinear". With that they virtually get an approximation of trilinear for free.

A current mobile GPU for mobile/PDA devices has today mostly 1 TMU and in some rare exceptions 2 TMUs. While I understand that adding filtering capabilities will add in die area I don't think it would be an as big problem for future generations. In the desktop space you have dozens of TMUs (up to 96 for the GT200 currently) and there the "bill" is proportionally a lot higher as of course that resolutions are still miniscule compared to the desktop. How large can you get a screen on a mobile phone anyway, without the device ending up at the size of a shoe.

Besides I have the feeling that fixed function hw will further minimize in the future for texture mapping/filtering and render outputs might get even more absorbed by other units like the ALUs and/or memory controllers.

Finally while I understand that competing IHVs struglgle amongst other things to have the highest possible feature set for each generation per mW, I personally feel that a line could be drawn and not follow the desktop parts in that regard as closely. SGX already mentions procedural geometry in its whitepapers; I have the feeling that none of it is actually necessary for up to 10.1 and I severely doubt that any mobile developer would deal with any of it for the lifetime of that generation. The question here is if transistor budget for feature X could have been a better investment elsewhere.
 
Back
Top