Atomics are very nice, but they don't mention local memory (aka shared memory in CUDA) at all. This is not a good sign... Here's a direct quote from the OpenCL 1.1 specification:
Exophase: I agree 64-bit pointers is a pure marketing gimmick on that specific iteration of the architecture, but as it scales up in peformance, I could see it being very useful, so it makes sense to invest R&D into it now (although unfortunately it doesn't make as much sense to invest die area into it...)
I agree the overdraw post is ludicrous. Maybe it's simply phrased very badly and what they meant is that games with a rough front-to-back ordering never shade their pixels more than twice on average. I don't have any hard numbers personally, but something between 1.0x and 2.0x but slightly closer to 1.0x seems very realistic to me. Determining the performance penalty of a Z-only pass would be another good way of evaluating the real cost of not being a TBDR.
Of course many games have the iPhone (i.e. SGX) as their lead development platform, so casual devs might not even bother with front-to-back ordering or Z-only passes. Then again as long as the performance is high enough, that's just a power penalty in something where the display takes the vast majority of the power anyway - so performance in more demanding games which will usually receive the necessary optimisations arguably matters more, and there the difference is much smaller (even if it'd be insane to pretend it's negligible).
Lazy8s: Well the Mali400 was justified no matter what by the fact the 543MP wouldn't have been ready in that timeframe and it's much faster (at 4MP) than a 540. I suppose T604 also has the big advantage that in terms of APIs, it's more comparable to Series6 than SGX. I do not personally believe that's of much practical significance given its performance level, but I suppose Samsung would have taken it into consideration in their choice.
BTW, T604 (as per those blog posts) is only a 1 TMU design. That means a 4-core T604 is comparable per MHz to a single-core 544MP. I'd certainly like to know how their die sizes compare!
So is there any on-chip local memory (i.e. SRAM, whether dedicated or somehow repurposing cache) - and if not, how can they claim any significant advantage from supporting Full Profile when one of the most frequent idioms would come at a massive performance and especially power cost? Hopefully I'm just reading too much into the lack of an explicit mention...Local Memory: A memory region local to a work-group. This memory region can be used to allocate variables that are shared by all work-items in that work-group. It may be implemented as dedicated regions of memory on the OpenCL device. Alternatively, the local memory region may be mapped onto sections of the global memory.
Exophase: I agree 64-bit pointers is a pure marketing gimmick on that specific iteration of the architecture, but as it scales up in peformance, I could see it being very useful, so it makes sense to invest R&D into it now (although unfortunately it doesn't make as much sense to invest die area into it...)
I agree the overdraw post is ludicrous. Maybe it's simply phrased very badly and what they meant is that games with a rough front-to-back ordering never shade their pixels more than twice on average. I don't have any hard numbers personally, but something between 1.0x and 2.0x but slightly closer to 1.0x seems very realistic to me. Determining the performance penalty of a Z-only pass would be another good way of evaluating the real cost of not being a TBDR.
Of course many games have the iPhone (i.e. SGX) as their lead development platform, so casual devs might not even bother with front-to-back ordering or Z-only passes. Then again as long as the performance is high enough, that's just a power penalty in something where the display takes the vast majority of the power anyway - so performance in more demanding games which will usually receive the necessary optimisations arguably matters more, and there the difference is much smaller (even if it'd be insane to pretend it's negligible).
Lazy8s: Well the Mali400 was justified no matter what by the fact the 543MP wouldn't have been ready in that timeframe and it's much faster (at 4MP) than a 540. I suppose T604 also has the big advantage that in terms of APIs, it's more comparable to Series6 than SGX. I do not personally believe that's of much practical significance given its performance level, but I suppose Samsung would have taken it into consideration in their choice.
BTW, T604 (as per those blog posts) is only a 1 TMU design. That means a 4-core T604 is comparable per MHz to a single-core 544MP. I'd certainly like to know how their die sizes compare!