Creative ZMS-20 and ZMS-40

Deleted member 13524 · May 11, 2011

After launching the ZMS-08 with supposedly great multimedia capabilities but mediocre 3D performance, the ZiiLabs are now launching two new SoCs, now using a revamped "stemcell" architecture and Cortex A9 MPCore:

ZMS-20 Key Features

- Dual 1.5GHz ARM Cortex A9 cores with Neon
- ZiiLABS flexible Stemcell media processing capabilities
| Low-energy SIMD architecture for high performance media acceleration
| 48 x 32-bit floating point media processing cores for 26GFlops of compute
| High Profile H.264 video playback at 1080p@30fps
| Wide range of accelerated video codecs including H.264, VC1 and VP8
| High Definition, low latency video conferencing
| Optimised OpenGL ES 2.0 for robust 3D graphics acceleration and application compatibility
| Accelerated OpenCL 1.1 (desktop profile) integrated into Android NDK
| High Dynamic Range (HDR) Image Processing
| High-quality Text-to-Speech and Voice Recognition
| 150 MPixel/sec image processing
| 200MHz Pixel clock image processing
| Adobe Flash 10 acceleration
- Integrated HDM1 1.4 with 3D stereo support
- 1GB addressable memory
| DDR2/3 at 533 MHz for low-cost
| LPDDR2 for maximum memory bandwidth and low-power
| 64-bit wide memory bus
- Quad independent video controllers supporting 24-bit displays and cameras
- 3V3 and 1V8 I/O’s reduce peripheral power consumption
- Dual USB 2.0 HS OTG (Host/Peripheral) controllers with PHY for low system cost
- Three independent SDIO/MMC controllers
- Extended battery life with robust Dynamic Power Management and Instantaneous On
- Xtreme Fidelity X-Fi audio effects
- Enhanced Security: 256-bit AES (Advanced Encryption Standard) and TrustZone

So there are two Cortex A9s @ 1.5GHz and 48 "Stemcell" cores @ 200MHz (I'm assuming the stemcell cores are the only ones doing the pixel processing).
Like its predecessors, it seems ZiiLabs are keeping away from any fixed-function unit, be it for video encode\decode, audio or even 3D rendering like TMUs and ROPs units.
I wonder what kind of perfomance this can achieve (not great for typical 3D engines, if it's anything like its predecessor).
I assume those 150MPixels/s are for raw image processing (decoding\encoding) and aren't related to the SoC's fillrate (it'd be very bad if it was) -> meaning it could do 1080p 3D stereo recording @ 25fps, assuming there's enough bandwidth to the mass storage to do it.
Although it says the A9s have NEON, the block diagram seems to indicate that one can have the possibility of choosing the FPU instead of NEON (which wouldn't make much sense since stemcell cores should be a lot faster for FP calculations).

The press release basically states that the ZMS-40 is basically 2*ZMS20, with quad-core A9 and 96 processing cores. There's also an interesting sentence in the press release:

Quad-Core ZMS-40 Delivers Aggregated Cortex-A9 Clock Speed of up to 6GHz and Scales Up Total Performance to 100 Processing Cores

.

I guess the 100 processing cores are the 96 Stemcell + 4 Cortex A9 (weird calculations, though), but I didn't know the A9s could go up to 6GHz.
Wouldn't this rise the power consumption to a level where it would make more sense to go with A15s instead? Or could the A15's advantage become irrelevant if FP calculations are done through the stemcell cores (with virtualization being the only missing feature)?

The ZMS-20 is sampling right now so it's probably being done using 45 or 40nm. ZMS-40 may be reserved for 28nm only.
Both these SoCs are aimed at tablets, so it seems Creative has given up on the small-MID\smartphone segments, for the moment.

I really like Zii's architectures, given the fact that they've thrown away any fixed-function units ever since the old ZMS-05 with a dual-ARM9. And although that's clearly "the future", I think our technology hasn't been ready for that and that's why the 3D performance and power efficiency hasn't been up to par with other solutions and they haven't had any major design wins so far.

liolio · May 11, 2011

ToTTenTranz said:
The press release basically states that the ZMS-40 is basically 2*ZMS20, with quad-core A9 and 96 processing cores. There's also an interesting sentence in the press release:
.

I guess the 100 processing cores are the 96 Stemcell + 4 Cortex A9 (weird calculations, though), but I didn't know the A9s could go up to 6GHz.
Wouldn't this rise the power consumption to a level where it would make more sense to go with A15s instead? Or could the A15's advantage become irrelevant if FP calculations are done through the stemcell cores (with virtualization being the only missing feature)?

Honestly I believe 6GHz means 4x1.5GHz cores, no more no less, PR talks

Deleted member 13524 · May 11, 2011

liolio said:
Honestly I believe 6GHz means 4x1.5GHz cores, no more no less, PR talks

I just noticed the "Aggregated Cortex-A9 Clock Speed" in the sentence.
My god you're right! That was so lame!

I hadn't seen that low-strike "let's just sum the gigaherts from each core" PR move ever since the lousiest computer stores stopped using it some 6 years ago!

Exophase · May 11, 2011

I'm glad to see another high clocked Cortex-A9 with NEON. Hopefully Tegra 2 remains the only high end ARM SoC without NEON and the whole prospect gets phased out.

ToTTenTranz said:
So there are two Cortex A9s @ 1.5GHz and 48 "Stemcell" cores @ 200MHz (I'm assuming the stemcell cores are the only ones doing the pixel processing).

Okay, so 200MHz * 48 stemcell cores = 26 GFLOPS. That's 2.7 FLOPs/cycle/core. I have absolutely no idea where that number comes from. Maybe they're combining the Cortex-A9 numbers. Those are 2 FLOP/cycle (really 4 if you count FMLAs but the latency on those is awful, I suspect most aren't counting in that direction), so peak at 6GFLOPS. That brings you down to 2.0833 FLOPs/cycle/core. If you go with 2 you'd get 25.2... well, I don't really know.

Since it has 16-bit floats I presume it can pair them; no idea if that's being counted in the FLOP number. The SIMD arrangement is probably a lot like a GPU's ALU array. Mobile GPUs are starting to provide around the same level of computation, although they have an advantage in shipping a bunch of libraries that are using these units.

ZiiLABS have been so tight lipped about their stemcell architecture from the start, and I don't know of any way to program it directly or even run user code on it. They've boasted OpenCL (which is an obvious asset for them) but the Plaszma documentation only mentions OpenGL ES 1.1, so no shaders at all.

ZiiLABS desperately needs to change this for this thing to be useful; the libraries and OS are cool but they're just not offering anything to differentiate themselves enough.

ToTTenTranz said:
I assume those 150MPixels/s are for raw image processing (decoding\encoding) and aren't related to the SoC's fillrate (it'd be very bad if it was) -> meaning it could do 1080p 3D stereo recording @ 25fps, assuming there's enough bandwidth to the mass storage to do it.

The 150MPixel/s is definitely an estimate for 3D fillrate, they wouldn't use that metric for anything else. Texture fetch is an expensive operation, where you'll do most or all of something like this just for the basics with no filtering:

- LOD calculation (even w/o filtering you want to use mip-mapping to avoid killing the cache when you're zoomed out)
- MIP address calculation
- Address clamping/wrapping
- Address swizzling
- Load from a dynamic address
- Format conversion/decompression or cached conversion which hurts bandwidth

If with bilinear filtering that becomes:

- Four loads from dynamic addresses instead of one
- Bilinear interpolation

Trilinear is worse, and of course anisotropic is very expensive where it's needed (but just checking for it is probably not that cheap).

A lot of these operations are improved with parallel independent loads and LUTs. Unfortunately most vector FPUs don't optimize this (so called scatter-gather, although in this case just doing gather would be a big benefit). Larrabee was one of the few architectures that did have this feature, and they still ended up adding TMUs to it.

We don't actually know if the number includes filtering or not; I seem to remember a number at some point for ZMS-05 for bilinear filtering on or off (42MPixel/s is the baseline they give) but I can't find anything like that.

ToTTenTranz said:
Although it says the A9s have NEON, the block diagram seems to indicate that one can have the possibility of choosing the FPU instead of NEON (which wouldn't make much sense since stemcell cores should be a lot faster for FP calculations).

VFPv3 support is included in NEON on Cortex-A9 They share a register file and probably share at least some of their functional units, since NEON includes vector floating point support. You wouldn't often use scalar operations over NEON given the choice, but including them is cheap and gets you better compatibility.

Using the NEON over the Stemcells has the advantage of actually being able to program for it. Even when OpenCL is available for the Stemcells (if it isn't already, apologies if I'm just missing this) there'll be much less of a software ecosystem for it and people will be much more comfortable with standard CPU SIMD. Barring that, there are still going to be circumstances where 2x 1.5GHz 2-wide SIMD cores are better suited to a task than the Stemcells - even though they have much lower peak throughput they'll probably have better latency (in raw time units) and probably deal with branchy code much better. It's the usual CPU vs GPU tradeoff.

ToTTenTranz said:
I really like Zii's architectures, given the fact that they've thrown away any fixed-function units ever since the old ZMS-05 with a dual-ARM9. And although that's clearly "the future", I think our technology hasn't been ready for that and that's why the 3D performance and power efficiency hasn't been up to par with other solutions and they haven't had any major design wins so far.

I'm not sure if it's really the future, not 100%.. I expect more of a convergence in the middle. That's the direction things have been moving in anyway.. GPUs have gotten a lot more programmable, but video encode/decode has moved more to fix function, even in high end CPUs like Sandy Bridge. There's an obvious perf/W advantage for some critical tasks. In fact, I imagine that Zii-powered devices are at a disadvantage when it comes to battery life and watching movies, although I'd have to see some benchmarks.

Deleted member 13524 · May 11, 2011

Exophase said:
The 150MPixel/s is definitely an estimate for 3D fillrate, they wouldn't use that metric for anything else.

I beg to differ.
They specifically state it's 150 MPixel/sec image processing.
That term has been used before for the ability to do photo\video handling.
Here's a paragraph from the BCM2727 datasheet:

Broadcom said:
The BCM2727 delivers a high-quality imaging solution exceeding
consumer portable digital cameras and approaching DSLR cameras.
Advanced on-chip ISP is capable of handling sensors up to 12 Mpixels
with on-the-fly image processing at 180M pixels per second and JPEG
compression at 10 full quality photos per second.

The term is being used at how fast the image signal processor can handle the pixels in a given frame.

150MPixels/s peak fill-rate would be really bad, as it would be about half of what the lowest-end offerings from PowerVR, Vivante, etc, have to offer.

Exophase said:
I'm not sure if it's really the future, not 100%.. I expect more of a convergence in the middle. That's the direction things have been moving in anyway.. GPUs have gotten a lot more programmable, but video encode/decode has moved more to fix function, even in high end CPUs like Sandy Bridge.

Just look at the decades-old history of computing. Every function that is now done through programmable ALUs have once been done by fixed function units.
I think it'll be a matter of time before video coding\decoding is efficiently passed on to programmable units, simply because you can generally achieve higher peak performance if you only use the latter.

I think even 3D rendering will, at some point, migrate entirely to programmable-only units (well, maybe not with the current rasterization model as it's too dependent on TMU and ROP performance, but sometime in the long run).

Exophase said:
There's an obvious perf/W advantage for some critical tasks. In fact, I imagine that Zii-powered devices are at a disadvantage when it comes to battery life and watching movies, although I'd have to see some benchmarks.

True. That's why Nintendo went with a (long criticised) fixed-function pixel shading GPU for 3DS. Atlhough we're yet to see how many AAA game\engine developers will be pushed away by that decision, as it already happened with Epic.

Exophase · May 11, 2011

ToTTenTranz said:
I beg to differ.
They specifically state it's 150 MPixel/sec image processing.
That term has been used before for the ability to do photo\video handling.
Here's a paragraph from the BCM2727 datasheet:

The term is being used at how fast the image signal processor can handle the pixels in a given frame.

150MPixels/s peak fill-rate would be really bad, as it would be about half of what the lowest-end offerings from PowerVR, Vivante and such.

Okay, I'll give you that reference, but in this context I think it's clear that it's talking about 3D. It's in line with what you'd expect scaling the 42MPixel/s number from the previous generation, which in turn is in line with what GLBenchmark shows.

Without fixed function this is what you get relative to this amount of computational units which probably aren't that specialized for parallel memory accesses; it's simply a fact of life. Like its predecessors this is going to be bad at 3D.

ToTTenTranz said:
Just look at the decades-old history of computing. Every function that is now done through programmable ALUs have once been done by fixed function units.

I'm not following you on computing moving from fixed function to general purpose. I only really see that being true in a limited sense for 3D accelerators. I think it's generally the reverse: something starts as software and gains hardware specialization. 3D rasterization itself naturally started as software, additionally video encode/decode but also functions such as encryption. Instructions sets have gotten more uniform/orthogonal/simpler compared to ye-olde-CISCs but this hasn't made them more programmable, and when instructions get added they're done so more in the area of specialization than flexibility. A lot of the additions to SSE for instance perform rather complex but limited use functions.

It's easy to brush these things off since they're still instructions embedded inside control flow, but it really is a compromise towards less programmability. Floating point itself is a perfect example; it's a convergence on a hardware standard, but it's definitely less flexible than doing it in software (which is what people actually did back in the day). This is what I mean by market convergence: you could split the floating point operation into smaller components to give it more programmability, but there's no real demand for it.

ToTTenTranz said:
I think it'll be a matter of time before video coding\decoding is efficiently passed on to programmable units, simply because you can generally achieve higher peak performance if you only use
the latter.

I think even 3D rendering will, at some point, migrate entirely to programmable-only units (well, maybe not with the current rasterization model as it's too dependent on TMU and ROP performance, but sometime in the long run).

The more fixed-function you get the better perf/W and perf/area you get for performing a task, and these metrics will define peak performance. You'll only get better performance from programmable units when doing the same task if the fixed function ones stop developing. Move towards more programmability is to allow you to do different things, not to do the same things faster.

Programmable units for encode/decode gives you more flexibility with formats and whatever but nothing is exotic yet popular enough to really fit outside of the basic computational blocks the industry has chosen to optimize.

The death of fixed-function units happens when they no longer increase in capability as fast as the technology increases. Sound cards are a good example of this: the cost of audio related tasks stopped increasing as much and turned into a smaller and smaller percentage of required CPU time. I think you'll see this happen with video too, and in the pretty near future, because you get diminishing returns from increasing resolution and frame rate and hit a wall for encoding efficiency. Then as CPUs continue to improve encoding the same baseline will use less and less power. Fixed function will still use proportionately less power but the numbers will fall into the noise.

However, technology scaling will eventually hit a wall as well when process shrinks slow down (as they already have been for everyone who isn't Intel). We don't know when graphics will really be good enough to the point where the move to improve them slows down; I suspect that unlike audio a huge super-linear increase in resources is needed to generate a linear subjective/aesthetic response to the improvement.

ToTTenTranz said:
True. That's why Nintendo went with a (long criticised) fixed-function pixel shading GPU for 3DS. Atlhough we're yet to see how many AAA game\engine developers will be pushed away by that decision, as it already happened with Epic.

Who is criticizing PICA200 exactly? There are indeed some isolated cases of developers rejecting it, but Tim Sweeney is waaay on the programmable side of the debate so that's not really surprising (and I suspect he's an outlier in the industry)

Lazy8s · May 11, 2011

darkblu's testing of the video playback capabilities of the previous generation Zii part showed it to be fairly efficient.

But yeah, that fill rate number is for graphics and reveals that graphics once again aren't a target market.

Arun · May 11, 2011

Do we know how much fixed-function hardware there really is here? Surely they aren't crazy enough to do even rasterisation and texture addressing without at least specialised instructions. I think Broadcom's VideoCore architecture has proven that a more software-centric architecture can be reasonably efficient for handheld 3D graphics if it has enough accelerators, but Zii's 3D performance is nowhere near as good.

BTW, I wrote an article in early 2009 called "Basic Primer on Computational Architecture Trade-Offs" - I think it's well worth the read in this context (especially but not only Page 2).

Deleted member 13524 · May 11, 2011

I did a little search and found out that the 42 textured MPixel/s isn't from the previous generation, it's from the ZMS-05, which is the first Zii chip, released over 2 years ago.

The previous generation, the ZMS-08 from last year that ended up in the ZiiO tablets, has a 1GPixel/s fillrate for 3D rendering.

And ZiiLabs are claiming 4x the performance of ZMS-08 in ZMS-20, so there's no way they'd cut back on the 3D fillrate.

I think a more realistic approach to the theoretical fill-rate would be to assume 1 pixel per core per clock. How many cores are left for pixel processing is a mistery, but should be able to do at least more than 1GPixels/s.

I'm now pretty sure that those 150MPixels/s are for imaging processing, just like the terminology used by BCM2727's 180MPixels/s and Tegra 2's 150MPixels/s (check the "Imaging" specs).

Besides, we can reach that number rather easily:
48 cores x 200MHz = 9600M total passes/s. Divide that for a 32bit bitmap and we have 300M passes per 32bit pixel. If a pixel operation needs two passes per core, then we reach the 150MPixels/s.

So will the 3D performance in ZMS-20 be crappy? Maybe, probably yes, if their previous SoCs are any indication. But we don't know yet. What I do know is that the performance won't be anywhere related to that 150MPixel/s number, as it's not related to fillrate.

Exophase · May 11, 2011

Arun said:
Surely they aren't crazy enough to do even rasterisation and texture addressing without at least specialised instructions.

They could be. Cell was allegedly intended for software rendering and it has little in the way of heavy lifting instructions intended for it. Of course, if there really was a plan to use it without a GPU in PS3 it was one that clearly fell through, probably when they realized how poorly it'd perform.

I'd actually be very interested to see what the design of the software renderer is like. TBDR is kind of a given in order to optimize data parallelism over contiguous-memory regions, but that seems to have its own new challenges, like dealing with the interpolation no longer being continuous (un-strength reducing the additions back into small multiplies). Maybe I just haven't thought about it hard enough.

Tottentranz said:
The previous generation, the ZMS-08 from last year that ended up in the ZiiO tablets, has a 1GPixel/s fillrate for 3D rendering.

Doesn't like that way here:

http://www.glbenchmark.com/compare.jsp?benchmark=glpro11&showhide=true&certified_only=1&D1=Creative%20ZiiO%207"
http://glbenchmark.com/phonedetails.jsp?benchmark=glpro11&D=ZiiLABS+ZMS-08&testgroup=lowlevel

But we have to be cautious of how "fill rate" is defined. I could see it potentially doing a single untextured pixel per core over two cycles if the colors are perspective correct. It has to interpolate r/g/b and depth, then it has to multiply them by 1/w, then it has to update the framebuffer and depth buffer. Maybe one cycle if they're not perspective correct. Obviously no ROP functions like alpha blending, and big triangles that absorb the per-line costs.

But texturing? Without specialized hardware it'd take several cycles per pixel. ZMS-08 had 64 SIMD units, I don't know what clocked at, probably under 200MHz. Maybe 1GPixel/s isn't impossible but I'm quite skeptical. I just don't think you could get that far w/o fixed function hardware, which I guess we haven't proven it doesn't have. Moreover, if it's competent, why are they not saying anything?

JohnH · May 12, 2011

ToTTenTranz said:
Just look at the decades-old history of computing. Every function that is now done through programmable ALUs have once been done by fixed function units.
I think it'll be a matter of time before video coding\decoding is efficiently passed on to programmable units, simply because you can generally achieve higher peak performance if you only use the latter.

This is exceptionally unlikely to happen in any power sensitive application. Modern video is based around a few fixed standards which fixed function HW can handle in a very small fraction of the power (1 to 2 orders of magnitude less) and area of a programmable core capable of the same level of performance. Further, the video standards include functions that simply do not fit well with massively parallel architectures. The arguement that programmable solutions allow you to handle new standards when they arrive rarely proves to be true either as those new standards inavriably require considerably more compute than the programable units have to offer or include yet more difficult to parallalise functions.

John.

Deleted member 13524 · May 12, 2011

JohnH said:
This is exceptionally unlikely to happen in any power sensitive application.

And wouldn't you agree that the line that separates low-power application processors from "desktop-class" processors grows thinner everyday?
What if that line disappears in 10, 15, 20 years?

JohnH said:
Modern video is based around a few fixed standards which fixed function HW can handle in a very small fraction of the power (1 to 2 orders of magnitude less) and area of a programmable core capable of the same level of performance.

Hmm.. "modern video" isn't "future video"...
And how about adding more instructions to existing CPUs, and eventually being able to do something like per-MHz clock+voltage scaling depending on workload?

Exophase gave the good example of 3D sound processing. Starting 2006 in high-end desktops, laptops and 7th-gen consoles, the processing requirements for it became so irrelevant for the CPUs that eventually they fell into "noise". Why wouldn't that happen to video at some point?

For example, I don't see the 3D 1080p "baseline" going much further anytime soon, and this ZMS-20 seems to be doing it at ~1W using only programmable units. Isn't that a sign of things to change?

JohnH said:
Further, the video standards include functions that simply do not fit well with massively parallel architectures.
The arguement that programmable solutions allow you to handle new standards when they arrive rarely proves to be true either as those new standards inavriably require considerably more compute than the programable units have to offer or include yet more difficult to parallalise functions.

I'm going to push a bit further with the "what if"s (since we're talking about >8 years developments, during which we really don't know what'll happen for sure).
What if the "future video" standards are developed thinking backwards?
"Ok so we have this hardware with these capabilities and parallelism. Let's make a video codec that takes full advantage of it and allows it to be more efficient and use less power than the current standards".

JohnH · May 12, 2011

ToTTenTranz said:
And wouldn't you agree that the line that separates low-power application processors from "desktop-class" processors grows thinner everyday?
What if that line disappears in 10, 15, 20 years?

The line isn't as thin as you'd think, handheld TDP limit is around 2W (which is likely to be too hot to hold), desktop it's about 1000W, these limits are uinlikely to increase.

Hmm.. "modern video" isn't "future video"...

If you look at recent history, video has tended towards a few international standards with the "interresting" proprietry formats tending to be dropped in favour of standardisation, there is no sign of this trend being reversed.

And how about adding more instructions to existing CPUs, and eventually being able to do something like per-MHz clock+voltage scaling depending on workload?

As has been discussed elsewhere, you can optimise teh data path but you still end up with a lot of active logic that isn't needed in the fixed function equivalent.

Exophase gave the good example of 3D sound processing. Starting 2006 in high-end desktops, laptops and 7th-gen consoles, the processing requirements for it became so irrelevant for the CPUs that eventually they fell into "noise". Why wouldn't that happen to video at some point?

At some point? Maybe. But video is hugely more compute intensive than audio and the video standards are still moving, so I think it's probably a lot further out than you'd think, if ever.

For example, I don't see the 3D 1080p "baseline" going much further anytime soon, and this ZMS-20 seems to be doing it at ~1W using only programmable units. Isn't that a sign of things to change?

I'm not sure where you're getting 1W from. That asside the ZMS 20 only lists h.264 HP 1080p @ 30fps, so it probably isn't capable of 3D decode and it also doesn't list what level it supports i.e. L4.2 is require for full blueray playback, I strongly suspect it doesn't support it otherwise they'd have listed it. Anyway to give you an indication of the magnitude difference, a fixed function decoder will typically do HP L4.2 in around 10mW in similar process tech i.e. about 1/100 of the power consumption you suggest above, although obviously if the zms-20 still can't handle blueray playback it's a moot comparison.

I'm going to push a bit further with the "what if"s (since we're talking about >8 years developments, during which we really don't know what'll happen for sure).
What if the "future video" standards are developed thinking backwards?
"Ok so we have this hardware with these capabilities and parallelism. Let's make a video codec that takes full advantage of it and allows it to be more efficient and use less power than the current standards".

That would imply they only care about having X compute without consideration for power which is never going to wash for standards targeting consumer level devices. I also suspect that it would result in only marginaly improvements in compression ratios, quality or throughput, you wouldn't see the sort of step function we have with say MPEG2->h.264

John.

Xmas · May 12, 2011

It's a bit surprising that ZMS-20 comes with 48 FP media processing cores where ZMS-08 had 64, though probably lower clocked and less capable.

26 GFlops sounds like 48 multiply-add at ~270 MHz.

Exophase · May 12, 2011

Oh, I guess 200MHz was a number ToTTenTranz made up, never mind my earlier calculations then >_>

Deleted member 13524 · May 12, 2011

JohnH said:
The line isn't as thin as you'd think, handheld TDP limit is around 2W (which is likely to be too hot to hold), desktop it's about 1000W, these limits are uinlikely to increase.

I didn't mean max limits, I meant averages. The average power consumption for the average desktop (or a notebook, for that matter) is going down, the average power consumption for a smartphone is going up. The line is getting thinner.

JohnH said:
That would imply they only care about having X compute without consideration for power which is never going to wash for standards targeting consumer level devices. I also suspect that it would result in only marginaly improvements in compression ratios, quality or throughput, you wouldn't see the sort of step function we have with say MPEG2->h.264

I specifically said that power draw would be one of the main considerations..
Isn't quality dictated by bitrate and resolution? Bitrate aside, is h.264 capable of an image quality that MPEG2 isn't (honest question)? And why assume there would be marginal improvements in compression ratios, given that the purpose is to increase the parallelism in the coding\decoding? How does the later imply the first?

Xmas said:
It's a bit surprising that ZMS-20 comes with 48 FP media processing cores where ZMS-08 had 64, though probably lower clocked and less capable.

26 GFlops sounds like 48 multiply-add at ~270 MHz.

Like they claimed an "aggregated Cortex A9 clock speed of 6GHz" and added the 4 ARM cores to the 96 "stemcell" cores to get the "100 cores" in the press release, it's also possible they are adding the GFLOPS from the dual Cortex A9 @ 1.5GHz to the GFLOPS from the 48 stemcell cores in order to get the 26GFLOPS.

Regarding the decreased number of stemcell cores from the ZMS-08 to ZMS-20, notice how they treated the 64 cores as "8 clusters of 8 cores" in the ZMS-08, and they all now treated as single entities in ZMS-20. This could mean that while in ZMS-08 there was some kind of scheduler for each group of 8 cores, the cores in ZMS-20 seem to be more "autonomous".
So the cores in ZMS-20 could be much more efficient, resulting in greater performance despite the lower core amount.

Exophase said:
Oh, I guess 200MHz was a number ToTTenTranz made up, never mind my earlier calculations then >_>

Easy there, I explicitly stated in the first post where I took that number from and no one apparently seemed to object (go look for it if you want to). I didn't take the 200MHz out of my ass just for the lulz.
If you don't agree with how I assumed that number, It's not my fault that you apparently didn't bother to read my post.

JohnH · May 12, 2011

ToTTenTranz said:
I didn't mean max limits, I meant averages. The average power consumption for the average desktop (or a notebook, for that matter) is going down, the average power consumption for a smartphone is going up. The line is getting thinner.

Err, it's still 100's mW vs 100's W, it does matter if it's average or peak, if it's getting thinner it isn't doing so to any meaningful degree.

I specifically said that power draw would be one of the main considerations..
Isn't quality dictated by bitrate and resolution? Bitrate aside, is h.264 capable of an image quality that MPEG2 isn't (honest question)? And why assume there would be marginal improvements in compression ratios, given that the purpose is to increase the parallelism in the coding\decoding? How does the later imply the first?

Quality is determined by bitrate, resolution, the target codec and the quality of the encoder used. h.264 can offer high quality in potentially lower bit rates than MPEG2, it does this through significantly more complex processing. Each major revision of the video codec standards has required substantially more mips to achieve a given frame rate e.g. h.264 requires 3 to 5x the mips of MPEG2 to decode. If, when you're developing a new codec standard, you're constrained to the same number of mips as the previous standard it's highly unlikely that you be able to achieve more than small incremental improvements over the previous standard. Further, not every function necessarily lends itself to massive parallelism.

Exophase · May 13, 2011

ToTTenTranz said:
Easy there, I explicitly stated in the first post where I took that number from and no one apparently seemed to object (go look for it if you want to). I didn't take the 200MHz out of my ass just for the lulz.
If you don't agree with how I assumed that number, It's not my fault that you apparently didn't bother to read my post.

That's a pretty personal response. You said "it's 200MHz" and I didn't realize that was a number you made up rather than one they provided, that's all. Your parenthetical on that isn't very clear that it's connected to the figure.

Deleted member 13524 · May 13, 2011

Exophase said:
That's a pretty personal response. You said "it's 200MHz" and I didn't realize that was a number you made up rather than one they provided, that's all. Your parenthetical on that isn't very clear that it's connected to the figure.

Yet you're still wrong, I didn't make it up. I may have taken it from the wrong spot (within the specs in the website), but most certainly didn't make it up.
Since it's right there in the first post, I won't explain any further.

Exophase · May 13, 2011

Huh, I searched around for "200MHz" earlier and thought I only found your mention. Guess I really can't read.

Still, I do think the connection here is weak.. I mean, I don't think "pixel clock" really refers to the stemcell array clock. Can't see why they wouldn't have put it that way. Pixel clock can mean RAMDAC; this gives just enough for 1080p past 60Hz.

Creative ZMS-20 and ZMS-40

Deleted member 13524

Guest

liolio

Aquoiboniste

Deleted member 13524

Guest

Exophase

Deleted member 13524

Guest

Exophase

Lazy8s

Arun

Unknown.

Deleted member 13524

Guest

Exophase

JohnH

Deleted member 13524

Guest

JohnH

Xmas

Porous

Exophase

Deleted member 13524

Guest

JohnH

Exophase

Deleted member 13524

Guest

Exophase

Similar threads