View Full Version : SGX comments (pruned from Amd z430 thread)
This thread was originally part of http://forum.beyond3d.com/showthread.php?t=56866 but SGX-specific comments have been moved here
THe Z430 has positioned the LG GT-540 phone in 2nd place in the GLbenchmark list, just above the 3GS position.
The SGX540 powered samsung Galaxy S has just went to the top of the list, even though it has twice the number of screen pixels than the LG phone.
Be interesting to see where the iphone4 slots in, when someone gets round to jailbreaking it.
http://www.glbenchmark.com/result.jsp
No jailbreak needed, Kishonti can run GLBenchmark on the device as they see fit.
No jailbreak needed, Kishonti can run GLBenchmark on the device as they see fit.
In which case they musn't have got their hands on an iphone4 yet, as previously they were quite quick with the ipad testing.
After I posted yesterday, Nokia N8 got tested and now lies in 2nd place.
At one time IMG-ed devices occupied pretty much the top 20 in there, now 9 of the top 20 are AMD-ed graphics, and one(N8) with graphics via an ST chip (is it broadcom graphics? )
Can we conclude that the competition has caught up in terms of usable 3D graphics (for es1.1)
I'm not speaking for ImgTec here whatsoever, but just the fact the results don't take into account screen resolution means it's hard to interpret the results, thus making it hard to position devices and come to that kind of conclusion.
I'm not speaking for ImgTec here whatsoever, but just the fact the results don't take into account screen resolution means it's hard to interpret the results, thus making it hard to position devices and come to that kind of conclusion.
Well, the 3GS is getting beaten by one with AMD Z430 using same resolution, and by one with Broadcom (Nokia N8) using higher resolution
The 3GS is vsync locked. Yet more variables to throw into the mix ;)
doug1820
29-Jun-2010, 18:01
The iPhone 4 most likely uses the PowerVR SGX 535 like in the iPad,and 3GS. I don't think Apple would put a higher clocked GPU since the A4 needs to be extra power efficent since the Battery isn't a brick like in the iPad and that power hungry Retina screen is factor also.
I had noticed z430 leadership in some of the triangle tests at GLBenchmark a couple days ago, demonstrating some of the core's promise, but today's additions to the list are a real shake up. Broadcom's VideoCore III actually makes a very strong debut, and a phone I just bought myself a couple weeks ago, the myTouch Slide, places quite highly.
I'm not too surprised to see the Slide make the list; it impressed me with its smooth web browser despite being based on a last generation Qualcomm apps processor, a testament to the influence of drivers/software over hardware. The lack of FP support, however, ironically prevents this Android device from getting a port of Google Earth, to my annoyance, and it's still not as smooth overall as my first generation iPhone. As soon as T-Mobile US gets the iPhone 4, either officially or through jailbreak unlock, I'll be switching back to iOS and away from the mishmash of user interfaces and features that is Android.
The 3GS is vsync locked. Yet more variables to throw into the mix ;)
The last words from the test author indicated that Vsync limit was not a significant contributing factor in the iphones scores.
http://forum.beyond3d.com/showthread.php?p=1421240#post1421240
The last words from the test author indicated that Vsync limit was not a significant contributing factor in the iphones scores.
http://forum.beyond3d.com/showthread.php?p=1421240#post1421240
That's not quite true. To quote my reply from that thread:
Vsync is always enabled on iPad/iPhone. Even though the devices are not reaching 60 fps (which would be equal to 9.81 Mtri/s) at which point the tests would be entirely vsync limited, vsync already has an effect at lower framerates. Many of the geometry tests are running at either 30 fps (4.9 Mtri/s) or 40 fps (6.54 Mtri/s) on iPad/iPhone.
That's not quite true. To quote my reply from that thread:
your "thats not quite true" might suggest what I said was wrong, the author DEFINITELY said it was not significant.
"Low level tests are not vsynced on iPad/iPhone. Only the high-level "HD" test (pretty old) is vsynced"
Now whether or not THAT'S true is a another matter
I have a vague recollection of seeing some video on the net of Quake running on the iphone @ >60FPS, but it could have been another IMG-ed device.
Now whether or not THAT'S true is a another matter
Sorry, I thought it was clear I was referring to what Laszlo said, not what you said.
I have a vague recollection of seeing some video on the net of Quake running on the iphone @ >60FPS, but it could have been another IMG-ed device.
Maybe Aava/Moorestown? There is no documented method to disable vsync on the iPhone (though there might be a private API).
Maybe Aava/Moorestown?
Ah yes Indeed, that was the platform that I recall seeing demoed.
darkblu
30-Jun-2010, 15:11
Maybe Aava/Moorestown? There is no documented method to disable vsync on the iPhone (though there might be a private API).
Seconded. I'd love to know what GLBenchmark uses to unlock vsync.
As coincidence would have it, iphone4 has just hit the GLbenchmark results list, in 2nd place behind to galaxy S (which runs the SGX540 and a smaller res screen).
Some anomolies in the results.
iphone4 int score looks at odds with just about everything else, @8032. the ipad gets 3968,and the 1Ghz Galaxy S gets 4264.
float is 21810, against the ipad's 26455. and the 3GS @ 9751. If we use this to guess the clock, one might guess the iphone is running about 20% slower than the ipad, say 800Mhz ?
Should this part of the discussion be moved back to the phone4 thread ?
Kudos to Kishonti for getting good and timely reads on all of the iOS devices.
The iPhone 4 ranked predictably in GLBenchmark behind the brute force of the four pipeline SGX540 yet ahead of the rest of the pack due to what I believe is its combination of a relatively well performance-tuned driver/software environment and an aggressive clock speed for the core.
I've seen a lot of claims in articles on the web that the 535 has double the triangle rate of the 530, but I suspect the only performance doubling between the two is in the texel fill. Can anyone confirm? GMA500 docs I've researched seem to support my belief.
frogblast
30-Jun-2010, 17:57
Seconded. I'd love to know what GLBenchmark uses to unlock vsync.
There isn't a way to disable vsync on iOS. If you want to get unencumbered performance results, we usually suggest replacing calls to -presentRenderbuffer with glFlush. For example, for every 4 frames, swap once and flush three times. This gets past vsync, and still let's you see what the app is doing.
Exophase
30-Jun-2010, 17:58
I've seen a lot of claims in articles on the web that the 535 has double the triangle rate of the 530, but I suspect the only performance doubling between the two is in the texel fill. Can anyone confirm? GMA500 docs I've researched seem to support my belief.
You won't find published numbers for either, or at least I haven't. The numbers in circulation (ie, on Wikipedia.. no citation given..) are probably interpolated from some ranges IMG gave which went from SGX520 all the way up to several core SGX543. All we really got from this is that SGX520 starts at 7M (at 200MHz), SGX545 is 40M, and SGX543MP4 is 140. Ailuros believes that SGX543 is being pre-scaled at MP1 to make the progression look linear, such that it's 40M like 545 and not 35M.
This is the progression he suggests (and currently I agree with):
SGX520: 7M (1 USSE with some other performance reduction in system)
SGX530, SGX535: 20M (2 USSE)
SGX540, SGX545: 40M (4 USSE)
SGX543MP1: 40M (4 USSE2)
The rate might be limited by triangle setup regulated per USSE, and not ALU (or else USSE2 would increase it)
On the other hand, IMG claims "enhanced triangle setup delivering up to 50% higher throughput" for SGX543. Where this actually fits in is anyone's guess. IMG also claims that these numbers are "real world" and not "synthetic", leading us to speculate that Samsung's crazy 90M numbers for SGX540 are.
It'd probably be good if we got some real raw triangle throughput tests, but driver quality would probably distort the story...
Ailuros
01-Jul-2010, 11:41
It doesn't necessarily have to be those exact numbers (albeit IMG has stated in the 545 announcement 40M Tris and in an older newsletter 31M for 540, I guess it comes down as to what the marketing department decides to rate each core at for any given time heh...) since they usually have a footnote for conditionals (<50% shader load f.e.).
It's the relative performance between cores that probably interests most and I have no reason to doubt that if I have X rate for 53x I won't have 2*X for 54x at least for USSE1 cores always at the same frequency.
In general triangle ratings are a bit of a mess especially if you look it up at wikipedia and yes it's probably some folks adding data ignoring frequency differences between different implementations of core A or B.
***edit: albeit completely unrelated Exophase: http://www.highperformancegraphics.org/media/Hot3D/HPG2010_Hot3D_NVIDIA.pdf ....food for thought when you have something that has multiple raster units.
******edit Nr.2:
Kudos to Kishonti for getting good and timely reads on all of the iOS devices.
The iPhone 4 ranked predictably in GLBenchmark behind the brute force of the four pipeline SGX540 yet ahead of the rest of the pack due to what I believe is its combination of a relatively well performance-tuned driver/software environment and an aggressive clock speed for the core.
I've seen a lot of claims in articles on the web that the 535 has double the triangle rate of the 530, but I suspect the only performance doubling between the two is in the texel fill. Can anyone confirm? GMA500 docs I've researched seem to support my belief.
Given the resolution by the way (320*480) I've compared the iPhone3GS vs. LG GT540 Optimus. On another comparison set I compared the Galaxy (800*480) with the iPhone4 (640*960) and the iPad (768*1024) and the very first gut feeling I get from the results is that the latter two must have quite a bit higher frequency on the GPU side then for the 540. You might have twice the ALUs in the latter but still the very same amount of TMUs compared to 535, so what really meows on a hot tin roof? ;)
Assume I'm right on track here, I've been saying for a long time that Apple concentrated mostly on fill-rate. A 540 might guarantee you twice the pipelines but still the very same amount of TMUs and frequency is going to be inevitably lower (think higher die area <-> power consumption). I wouldn't be in the least surprised if the 540 frequency in the Galaxy is still at best around iPhone3GS margins.
Considering Samsung's boast of GPU performance, I actually expect that they were fairly aggressive with the clock rate for the GPU in the Galaxy S app processor.
This particular implementation of Galaxy S isn't having great compatibility with GLBenchmark, but Qualcomm's Neocore allowed the 540 to flex a bit more.
http://androidandme.com/wp-content/uploads/2010/06/android-neocore.png
55.7 fps actually means it's vsync/refresh rate limited (Galaxy S screen refresh is ~56 Hz).
Nice.
I'll be buying myself a Vibrant this month, so my situation is looking great from a hardware standpoint then.
The Vibrant's release will actually represent the first PowerVR phone officially for T-Mobile USA. Even though PowerVR is so common across the flagship devices of all of the phone manufacturers, the flagship phones of the non-US manufacturers have almost no representation at the US carriers, and each of the other three carriers already got their PowerVR offering through an exclusive deal with one of three big US phone makers: Motorola Droid with Verizon, Palm/HP Pre with Sprint, and Apple iPhone with AT&T.
I'm grateful for Samsung's wide-ranging Galaxy S introduction; I've actually been really annoyed that my hardware couldn't support Google Earth on my new myTouch Slide (no CPU FP support, thanks Qualcomm), an app which I'd been enjoying on my iPhone for years until I was kinda forced to change from AT&T to T-Mobile and switch phones.
Gizmodo just posted up a side-by-side video of a Galaxy S (1Ghz hummingbird +SGX540) against a HTC desire (snapdragon + adreno) running Quake2. The difference is astounding, the Galaxy appears to be running at least 3-4 times quicker.
There is also a video on that same page of Quake III running smoothly on the galaxy.
http://gizmodo.com/5580123/quake-2-test-htc-desire-vs-samsung-galaxy-s
Wishmaster
06-Jul-2010, 15:08
Gizmodo just posted up a side-by-side video of a Galaxy S (1Ghz hummingbird +SGX540) against a HTC desire (snapdragon + adreno) running Quake2. The difference is astounding, the Galaxy appears to be running at least 3-4 times quicker.
There is also a video on that same page of Quake III running smoothly on the galaxy.
http://gizmodo.com/5580123/quake-2-test-htc-desire-vs-samsung-galaxy-s
Considering that SGX540 was always faster and better(on paper and now IRL) than adreno 200(z430) on qualcomm it's not surprising, wonder how adreno 220 will fair against SGX540
Exophase
06-Jul-2010, 15:19
55.7 fps actually means it's vsync/refresh rate limited (Galaxy S screen refresh is ~56 Hz).
Hum.. do you know if this refresh rate is a limitation of their SuperAMOLED technology? I'd feel a lot more comfortable with a value much closer to 60.
Considering that SGX540 was always faster and better(on paper and now IRL) than adreno 200(z430) on qualcomm it's not surprising, wonder how adreno 220 will fair against SGX540
Indeed, 535 is the one we'd need to get compared to 200, not 540
Hum.. do you know if this refresh rate is a limitation of their SuperAMOLED technology? I'd feel a lot more comfortable with a value much closer to 60.
I don't know why they picked 56 Hz, but I doubt it's a limitation inherent in SuperAMOLED technology.
At the end of the Quake 2 video you can see the fps counter peaking at 56 fps, too.
With the inconsistent and sometimes conflicting benchmarks currently available for mobile graphics, actual developer feedback and testing has been invaluable in confirming the indications and trends of performance suggested by the benchmarks.
Game developers have found the Motorola Droid to be the most capable among the last batch of Android phones, so the 535-equipped 3GS handily outperforming the Nexus One was little surprise when Distinctive Developments compared the two with a simulated game scenario.
As has been indicated, each SGX variant is the result of customized performance and sometimes features, not just the scaling of ALUs and TMUs. In the context of 200 MHz and < 50% shader load:
IMG was originally very specific about the performance of the 530, indicating a figure down to the half-million, 13.5M tri/sec, which logically implies that they weren't being too rough about their rounding. They later charted the 520 at 7M tri/sec, the 530 at 14M tri/sec, and the 540 at 28M tri/sec. Some of the pixel fill numbers from the old SGX tech docs did imply, as was mentioned, that the 520 was somehow even less than half of the 530 (the cancelled 510 was even far less, using a USSE-lite pipeline.)
All indications were that the 53x family shared similar geometry performance characteristics. Indeed, GMA 500 docs state a performance of 1 triangle per 15 cycles which equals 13.3M tri/sec, and Intel even rates it explicitely at 13M tri/sec where they commonly set the clock at 200 MHz. NEC claimed 15M tri/sec for the 535 in their NaviEngine SoC, so the 13M tri/sec range seems very probable for the 53x line of variants.
I've been inclined to believe IMG has customized each successive variant for incrementally higher geometry performance as provided in their original, respective press releases: 35M tri/sec for the 543, ~40M tri/sec for the 545, 100M tri/sec for the cancelled 555, and 95% of 35M tri/sec multiplied by the number of MP cores.
Exophase
13-Jul-2010, 16:55
Interesting, so those Wikipedia figures were official. Except 535 was the one listed at 28 instead of 540, with ensuing confusion.
Lazy8s, do you have a link for the GMA500 documentation?
Page 95, http://download.intel.com/design/chipsets/embedded/datashts/319537.pdf
Other implementations, like the CE3100, also list the same information in their product briefs.
Actually, the CE3100 product brief is the one which specifically provides the rating of 13M polygons per second.
http://download.intel.com/design/celect/downloads/ce3100-product-brief.pdf
Exophase
13-Jul-2010, 18:27
Thanks.
So, the 15 cycles/triangle figure is "transform only", ie should be shading throughput, presumably performing the operations described in 9.1.3 - so I guess vertex multiplication by projection/modelview matrix and the entire OGL lighting model calculations. I'm assuming perspective divide and screen space conversion are performed in fixed function rather than by shaders, but maybe someone else has a different answer for this.
I wonder if 15 cycles means chip cycles or per-USSE. If it's the former it'd be 30 USSE cycles that are needed for transformation of one vertex, which is probably pretty reasonable for a platform that can presumably do 1 32-bit FMAC per or 1 SIMD operation over colors per cycle.
"Transform only" also suggests that the bottleneck after shading could be worse - either during culling, binning, or rasterization. This is also assuming no additional incurred overhead such as multiple triangle setup due to splitting a triangle over tile edges.
On the other hand, the shading could be the bottleneck, possibly what the 0.5 peak vertex to triangle ratio means. In which case, vertexes with no shading (no lighting and maybe even no geometry transformation) might yield a higher peak triangle rate.
Simon F
14-Jul-2010, 07:29
On the other hand, the shading could be the bottleneck, possibly what the 0.5 peak vertex to triangle ratio means.
That just means that indexed triangles are used which gives you a typical peak of 2 triangles for every vertex, compared to triangle strips which give, at best, 1 triangle per vertex.
Exophase
14-Jul-2010, 08:33
That just means that indexed triangles are used which gives you a typical peak of 2 triangles for every vertex, compared to triangle strips which give, at best, 1 triangle per vertex.
I figured that's what it meant, but I guess somehow this didn't click for me. Maybe I need to look at a diagram, because in my mind it "feels" like one vertex/triangle is more what you get. I guess I should start by considering all the points a triangle strip would start sharing if you say, zig-zagged it left to right, top to bottom? Well, you can see I'm pretty disconnected from real 3D ;P
Simon F
14-Jul-2010, 10:56
I figured that's what it meant, but I guess somehow this didn't click for me. Maybe I need to look at a diagram, because in my mind it "feels" like one vertex/triangle is more what you get. I guess I should start by considering all the points a triangle strip would start sharing if you say, zig-zagged it left to right, top to bottom? Well, you can see I'm pretty disconnected from real 3D ;P
Well, you can concoct special cases, but for surfaces, it just follows from Euler's law (http://www.angelfire.com/mt/marksomers/77.html) that says
V + F = E + 2
where V = number of verts
F = number of faces
E = number of edges
Since we are dealing with triangular faces, we have 3 edges per face but each edge is shared between 2 faces, so E = F * 3/2
We thus get:
V + F = 1.5 F + 2
=> V = 0.5 F + 2
Ike Turner
14-Jul-2010, 11:58
Maybe we should say on topic (AMD Z430) and not turn this into another PowerVR SGX thread?
Simon F
14-Jul-2010, 13:08
Just tried to split the thread but there's no direct tool to do it. Might have to copy and then prune. Sigh
Ailuros
14-Jul-2010, 13:37
Isn't the HTC Desire using a Z430? If yes then Qualcolmm needs to look into its driver, cause if I judge from that first video performance is anything but acceptable and that in a relic like Q2: http://phandroid.com/2010/07/06/this-is-why-the-hummingbird-processor-in-the-samsung-galaxy-s-is-awesome/
Some of the US Galaxy S phones are now showing up on GLBenchmark, and T-Mobile did another good job of not messing up the driver/software environment of the Vibrant, keeping it closest to the reference Galaxy S.
The original Galaxy S and some other phones were also retested, and the benchmark now much more accurately guages the performance advantage of the SGX540.
And finally, a 2.0 OpenGL ES test of the new generation of phones:
http://www.youtube.com/watch?v=cl4p5JI0-gQ&sns=em
http://img408.imageshack.us/img408/627/androidgpubenchmarks20.png (http://img408.imageshack.us/i/androidgpubenchmarks20.png/)
http://androidandme.com/2010/07/news/galaxy-s-lineup-leads-the-pack-in-android-gpu-benchmarks/
Correct me if I'm wrong.
It seems that the execution units of the handheld GPUs are very small and must do any instruction in more cycles than a desktop GPU and this is because instead of having units of several FP32 in parallel for the same task or a SIMD configuration they have only one FP32 unit, making the need of more cycles for rendering an entire frame.
Simon F
18-Jul-2010, 11:06
It entirely depends on the device.
rpg.314
18-Jul-2010, 13:12
Correct me if I'm wrong.
It seems that the execution units of the handheld GPUs are very small and must do any instruction in more cycles than a desktop GPU and this is because instead of having units of several FP32 in parallel for the same task or a SIMD configuration they have only one FP32 unit, making the need of more cycles for rendering an entire frame.
As simon has already pointed out, there is a quite a bit of variation in design of embedded GPUs.
Ailuros
19-Jul-2010, 12:05
And finally, a 2.0 OpenGL ES test of the new generation of phones:
http://www.youtube.com/watch?v=cl4p5JI0-gQ&sns=em
http://img408.imageshack.us/img408/627/androidgpubenchmarks20.png (http://img408.imageshack.us/i/androidgpubenchmarks20.png/)
http://androidandme.com/2010/07/news/galaxy-s-lineup-leads-the-pack-in-android-gpu-benchmarks/
Hmmm does anyone know how many AA samples GLBenchmark is using for the FSAA tests? The performance difference between noAA and AA looks fairly reasonable on all tested devices. Could it be it's only 2x samples?
Hmmm does anyone know how many AA samples GLBenchmark is using for the FSAA tests? The performance difference between noAA and AA looks fairly reasonable on all tested devices. Could it be it's only 2x samples?
It's 4 samples.
metafor
20-Jul-2010, 00:01
It's 4 samples.
Interesting. If it's not a multi-sample implementation it could be an indication that pixel fill-rate isn't the primary limitation. Guess it makes sense for an 850x480 display.
Interesting. If it's not a multi-sample implementation it could be an indication that pixel fill-rate isn't the primary limitation.
The devices in the comparison all use multisampling.
ltcommander.data
22-Jul-2010, 13:27
What is the difference between the GLBenchmark HD and Pro results? Is one native resolution dependent while the other device resolution independent? I'm just curious why the iPhone 4 (SGX535) can be comparable to the Galaxy S/Vibrant (SGX540) in the HD benchmark but be 33% slower in CPU skinning and 50% slower in the Pro benchmark for GPU skinning.
The low-level tests also have the iPhone 4 and Galaxy S/Vibrant being very close for most tests except fill-rate and some of the lighting tests. Should the SGX540 be consistently faster than the SGX535 or maybe this is the result of the iPhone 4's GPU being clocked higher or driver stack being more mature?
What is the difference between the GLBenchmark HD and Pro results? Is one native resolution dependent while the other device resolution independent?
They're completely different benchmark scenes. See the screenshots at the bottom of this page:
http://www.glbenchmark.com/tools.jsp?benchmark=glpro11
I'm just curious why the iPhone 4 (SGX535) can be comparable to the Galaxy S/Vibrant (SGX540) in the HD benchmark but be 33% slower in CPU skinning and 50% slower in the Pro benchmark for GPU skinning.
The HD tests are entirely vsync limited on the iPhone and Galaxy S. The tests run for 30s, so divide the number of frames rendered by this and you get 59.9 fps on the iPhone and 55.8 fps on the Galaxy S, which in both cases corresponds to the screen refresh rate.
ltcommander.data
23-Jul-2010, 03:42
They're completely different benchmark scenes. See the screenshots at the bottom of this page:
http://www.glbenchmark.com/tools.jsp?benchmark=glpro11
The HD tests are entirely vsync limited on the iPhone and Galaxy S. The tests run for 30s, so divide the number of frames rendered by this and you get 59.9 fps on the iPhone and 55.8 fps on the Galaxy S, which in both cases corresponds to the screen refresh rate.
Thanks for clarifying the benchmark issues.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.