Huawei Ascend D/ Hisilicon K3V2 - Quadcore with 16-core graphics

Since NVIDIA counts ALU lanes as "cores", there's nothing that speaks against someone else using that kind of marketing crap.
 
Where did I read an article stating that there were lots of people quitting Qualcomm's Adreno division to form a graphics IP startup?

Is it too soon for that?
 
Some benchmarks from the Ascend showing up it seems:

http://phandroid.com/2012/03/16/doe...-xl-have-the-fastest-processor-on-the-market/

taijiHuawei.png

huaweinenamark.png


http://www.brightsideofnews.com/new...benchmarked2c-leaves-competition-in-dust.aspx

Using Basemark ES 2.0 benchmark from Finnish wizards at Rightware, Huawei K3V2 processor managed to beat NVIDIA Tegra 3, Qualcomm Snapdragon MSM8660, Samsung Exynos, TI OMAP4 right off the bat. At 1280x720 resolution (HD Ready), the Huawei K3V2 and its Hisilicon GPU managed to achieve 25 frames per second, with the closest follower being Qualcomm Snapdragon and its Adreno 220 GPU with 18 fps. In the same benchmark, NVIDIA Tegra 3 with its ULP GeForce GPU only managed to reach 12 frames per second.

We'll continue investigating the performance of Huawei Ascend D Quad XL, but it looks like there is a very good reason why Microsoft placed Huawei on the "Windows 8 OEM Approved Vendors List" immediately after the company showed its silicon.
Impressive results. Let me guess, dual channel memory controller?

Update:

Take in account, that these benchmarks seem to be from the 1.2Ghz Quad Core Ascend D Quad XL ( = The Smartphone version ). There is also a 1.5Ghz version.

They are still advertising 64 bit memory on the website. I'm assuming that its really just dual channel 2 * 32bit. A bit of confusing marketing never hurt anybody ( Apple anybody? ).

http://www.huaweidevice.com/worldwi...nfoId=3265&directoryId=6001&treeId=3745&tab=0
 
Last edited by a moderator:
Holly shit that IS impressive stuff, what the hell have they got under the sheets?? Mali t604?? thats ridiculous....

Basemark ES2: +- 105% faster then the Tegra3.

Nemamark2:

Nemamark2 seems to have a 26% improvement compared to the Tegra3. Based on those screenshots.

http://nena.se/nenamark/view?version=2&device_id=1563

There are about 10 test run showing the same results. With about 7 seem to be from different testers ( submitters )?

Now, those Tegra 3 results are indeed strange:

K3V2: 45.90(min) 62.75(avg) 67.30(max) Results: 177 Display: 1280x720
T3: 33.70(min) 49.38(avg) 58.70(max) Results: 535 Display: 1280x752

Looking at the submitters, we see that 7/10 of those results, seem to be from a 1.6Ghz Tegra3's. But they are actually slower results, then the 1.4Ghz versions, running on the older Android version. Only the 1.6Ghz Tegra3's are running on Android 4.0.3.

Take in account, that the Tegra has a little bit more pixel to render.

http://nena.se/nenamark/view?version=2&device_id=1098

Also interesting TF700K ( Transformer Prime with the MSM8960 ):

http://nena.se/nenamark/view?version=2&device_id=1341
http://nena.se/nenamark/view?version=2&device_id=1569

38 for the Transformer Prime 700 with the 1920x1128 display with the Adreno 225 and MSM8960. What is surprising. The screen resolution is much higher, but still. I expected more with the "beefed" up CPU, that everybody was raving about. Looks like it really needs a Adreno 320. Then again, the MSM8960 is cheating with 28nm, while all the others are with 40nm. They can probably enhance the performance also with a Tegra / K3V2 on 28nm, to close the gap.

28nm?

You do notice a problem with 28nm. Apple A5X is 45nm, Tegra 3 is 40nm, K3V2 is 40nm, only the MSM8960 is at 28nm. A bit strange ...

Well, until you read something like this:
TSMC suddenly halts 28nm production

http://semiaccurate.com/2012/03/07/tsmc-suddenly-halts-28nm-production/

Things get much more complex for Qualcomm. 28nm Kraits are not shipping to end users, and as far as we have heard, nothing has been given a definitive ship date. If there is a 2-3 week delay, the OEMs and carriers will be peeved, but the end user will have no clue about any problem. It is internal, logistical, and annoying, but not a PR disaster.

Might explain as to why everybody is staying at 45/40nm...

Conclusion:

Well, for there own design, Huawei did seem to have a good performer in t here hands. It looks to be surprised in the future with the Adreno320/Krait, but if the can value it at a good ( = not to expensive ) price, i think they can have a good system seller on there hands.
 
I'm pretty sure it's Vivante based on a video where HiSilicon bragged about the SGI heritage of the GPU IP's team (which is one of Vivante's marketing points). I'm not sure what core and what frequency but I suspect a good part of the unusually high performance (for a Vivante core) might be related to the very high level of memory bandwidth as that is clearly Vivante's weak point.

And while having that memory bandwidth available does help performance, using so much of it is obviously going to waste a lot of power compared to more bandwidth efficient architectures... And it might also reduce the performance benefit of higher performance cores as you'd just be constrained by bandwidth so much of the time.

Still, if it is Vivante, congrats to them for finally being part of a high-end smartphone and I'm looking forward to analysing the next-gen Freescale stuff once that's available! :)
 
Yea i get where your coming from the results are a little strange..you would think the faster Tegra's would at least have parity...

Well what no one has mentioned is maybe mali 400mp4 @ 400mhz?? that could hit that mark could it not?..that and 4 A9's@1.5 with say 2mb L2 cache...duel channel controller..lpddr2 1066...after all the chip set is also smoking all the other benchmarks including Antutu...so its got to be more than just a powerfull gpu..

Does anyone have glbenchmark results?

I have to say, im a bit bemused by Krait performnce as well, if you take Anands tests, which on an MDP is always going to be optimistic...they arn't that impressive out side of linpack..and Medfield actually beats it in sunspider..(if you take the results as face value.)..if you level the clocks..the A9's wouldn't be that far off either...
Anand says that it was consuming 750mw @1.5ghz PER CORE....that seems a lot considering 28nm, new uarch and the performance comparison..could it be a dodgy process that 28nm?

Everywhere is advertising that Adreno 320 is now 4x Adreno 225..instead of only 190% which was floating around before on the other thread...that seems more like it and that would destroy A5X with some good bandwidth (which as IMR instead of TBDR its gonna need)

EDIT; i have those Antutu results http://antutulabs.com/Ranking
There is definately something memory wise going on as well.
 
Last edited by a moderator:
I'm pretty sure it's Vivante based on a video where HiSilicon bragged about the SGI heritage of the GPU IP's team (which is one of Vivante's marketing points).
That's a strange thing to market considering multiple competitors could say the same thing.
 
Yea i get where your coming from the results are a little strange..you would think the faster Tegra's would at least have parity...

Well what no one has mentioned is maybe mali 400mp4 @ 400mhz?? that could hit that mark could it not?..that and 4 A9's@1.5 with say 2mb L2 cache...duel channel controller..lpddr2 1066...after all the chip set is also smoking all the other benchmarks including Antutu...so its got to be more than just a powerfull gpu..

Does anyone have glbenchmark results?

I have to say, im a bit bemused by Krait performnce as well, if you take Anands tests, which on an MDP is always going to be optimistic...they arn't that impressive out side of linpack..and Medfield actually beats it in sunspider..(if you take the results as face value.)..if you level the clocks..the A9's wouldn't be that far off either...
Anand says that it was consuming 750mw @1.5ghz PER CORE....that seems a lot considering 28nm, new uarch and the performance comparison..could it be a dodgy process that 28nm?

Everywhere is advertising that Adreno 320 is now 4x Adreno 225..instead of only 190% which was floating around before on the other thread...that seems more like it and that would destroy A5X with some good bandwidth (which as IMR instead of TBDR its gonna need)

I have seen the Mali400 showing up in different Chinese "Clone" tablets. Seems when they advertise that there SOC has a strong GPU, its almost always the Mali400 ( Single Core ). So, it is a possibility.

But i think Arun may be right with the Vivante. Mostly this information is not picked up by most western sites, but when you look around on the Asian, it seems to show up plenty of times the reference to GPU Vivante GC4000 ( this seems to be like a SGX543MP4 )?

With there 16 core GPU, they are probably talking about a 4 core, with each core being 4 shaders or something? Who knows ... But i'm fairly sure that its not really 16 "CORE", just marketing.

EDIT; i have those Antutu results http://antutulabs.com/Ranking
There is definately something memory wise going on as well.

Those results are strange. Based upon this, the K3V2 is doing 11789 ( 4th place ). Now, excluding those overclocked Tegra3's ( Rank one, and two ), you have a TF700T what seems to be the Krait version ( 1.5Ghz ) ranking third?

Something overlooked by people?

Tegra3 @ 40nm: 80mm² ( reported )
K3V2 @ 40nm: 12 mm x 12 mm = 144 mm² package? size
Apple A5 @ 45nm: 10.01 mm x 11.92 mm = 119.32 mm²
Apple A5X @ 45nm: 12.82 mm x 12.71 mm = 162.94 mm²

Also take in account, that the K3V2 has NOT the 5th energy saving CPU core. According to Huawei they designed it "properly" to save energy. What means LP for the entire SOC instead of Tegra3 being a hybrid with only the energy saving 5th core being LP. That fits with there claim that it can offer 30% more power saving.

Now, my point is, 5th core is missing, why is the reported package size, so much bigger then the Tegra3? It has 2 * 32bit memory structure, what takes more area then the Tegra3's single memory structure. Until you look at Apple there A5X. What is the biggest part of the SOC => that SGX543MP4, in other words, it again increases the change for the K3V2 being a multi "core" GPU design ( dual or quad ).

The graphics block handles 2-D and 3-D work and helps a handset deliver 35 frames/second video compared to 13 fps for Tegra 3 Source

Another bold claim. We have seen the GPU performing much better then the Tegra3 ( not that hard to do ), but this claims a almost 250% increase in speed. That puts it in the theoretical speed that the A5X claims compared to the Tegra3.

To be honest, the K3V2 is what starting to look, what the Apple A5x needed to have been.
 
I have seen the Mali400 showing up in different Chinese "Clone" tablets. Seems when they advertise that there SOC has a strong GPU, its almost always the Mali400 ( Single Core ). So, it is a possibility.

But i think Arun may be right with the Vivante. Mostly this information is not picked up by most western sites, but when you look around on the Asian, it seems to show up plenty of times the reference to GPU Vivante GC4000 ( this seems to be like a SGX543MP4 )?

With there 16 core GPU, they are probably talking about a 4 core, with each core being 4 shaders or something? Who knows ... But i'm fairly sure that its not really 16 "CORE", just marketing.

Those results are strange. Based upon this, the K3V2 is doing 11789 ( 4th place ). Now, excluding those overclocked Tegra3's ( Rank one, and two ), you have a TF700T what seems to be the Krait version ( 1.5Ghz ) ranking third?

Something overlooked by people?

Tegra3 @ 40nm: 80mm² ( reported )
K3V2 @ 40nm: 12 mm x 12 mm = 144 mm² package? size
Apple A5 @ 45nm: 10.01 mm x 11.92 mm = 119.32 mm²
Apple A5X @ 45nm: 12.82 mm x 12.71 mm = 162.94 mm²


Now, my point is, 5th core is missing, why is the reported package size, so much bigger then the Tegra3? It has 2 * 32bit memory structure, what takes more area then the Tegra3's single memory structure. Until you look at Apple there A5X. What is the biggest part of the SOC => that SGX543MP4, in other words, it again increases the change for the K3V2 being a multi "core" GPU design ( dual or quad ).

To be honest, the K3V2 is what starting to look, what the Apple A5x needed to have been.

Well firstly there has to be some funky marketing going on as they are sure as hell refer to ALU's/shaders..not full GPU cores with TMU's/rasterizers etc..
Even if it is a proper quad GPU like A5X or even a 'compromised' 'quad' like T604..the results whilst being great aren't that great..unless they have used a low clocked variant to keep power consumption down?.:???:

How do we know for sure that A5X is 45nm? thought it was assumed to be on Sammy's 32nm process?
Ditto with the K3V2 how do we know the manufacturing tech?

The '16' core is puzzling..the results on Antutu are also very high, bested only by super overclocked Tegra 3 tablets....could all that die space be in fact some L2 cache with some AMBA 4 like cache coherency??:???:

EDIT; Well my thinking is if you are going to have 4 Cortex A9's @1.5ghz..on 40nm..with no companion core..and powerfull graphics....that is going to blast through some battery...if not it puts more credence to my 'multi core' views..

EDIT 2; Well i have found a mention of Vivante GC4000 on nenamark getting 60.1fps... from an Android reference handset..
http://androidtabletupdate.com/tag/vivante-gc4000/

Thats certainly with in range to be a contender....

EDIT 3; Found some sketchy info about the Vivante GC4000.. http://www.globenewswire.com/newsroom/news.html?d=209307
 
Last edited by a moderator:
Something overlooked by people?

Tegra3 @ 40nm: 80mm² ( reported )
K3V2 @ 40nm: 12 mm x 12 mm = 144 mm² package? size
Apple A5 @ 45nm: 10.01 mm x 11.92 mm = 119.32 mm²
Apple A5X @ 45nm: 12.82 mm x 12.71 mm = 162.94 mm²

Also take in account, that the K3V2 has NOT the 5th energy saving CPU core. According to Huawei they designed it "properly" to save energy. What means LP for the entire SOC instead of Tegra3 being a hybrid with only the energy saving 5th core being LP. That fits with there claim that it can offer 30% more power saving.

Now, my point is, 5th core is missing, why is the reported package size, so much bigger then the Tegra3? It has 2 * 32bit memory structure, what takes more area then the Tegra3's single memory structure. Until you look at Apple there A5X. What is the biggest part of the SOC => that SGX543MP4, in other words, it again increases the change for the K3V2 being a multi "core" GPU design ( dual or quad ).

Package size isn't die size. The former can be much, much larger than the latter. See Tegra 2 and Tegra 3:

http://www.nvidia.com/object/tegra-superchip.html

Which are available on either 12x12mm or 23x23mm (yes, that's right) packages. Larger packages only have a minor impact on production cost vs larger die sizes, and can reduce assembly costs. I'm sure it pays in general to use standard dimensions, even if the pin count and layout probably isn't going to be standard.

A lot of performance claims made here seem suspicious. The Nenamark scores look like the usual vsync vs non-vsync marketing trick. GLBench offscreen really needs to be used for this, and I'm sure they know that and chose not to present those scores.
 
The workloads of some of these benchmarks and the variability of some of the comparative scores prevents me from buying some of the performance claims here.

HiSilicon says the texel fill is 1.3 billion per second and presumably IMR (especially with all of the SGI name dropping), so this SoC wouldn't have made for a better A5X to drive the new iPad display.

Some of the other theoretical performance figures sound nice, though.
 
Back
Top