GLBenchmark 2.0 Results

JohnH · Dec 23, 2010

Exophase said:
When it says deferred multi-pass deferred shading I take that to mean a lot more than Z pre-pass (sorry, I don't know what a G-buffer is), instead interpolating per-pixel lighting parameters (mainly normals) then combining them in a later shader. It seems that if the render target is changed that would defeat the SGX's deferred rendering and increase outgoing bandwidth a lot, all for no benefit.

The hardware is designed to handle changes in render target efficiently so I don't see that as an issue, although as you say it's not really a benifit for us (at least not how it's expressed in the standard API). G-Buffer is one of the alternative terms used to describe the buffers containing the per pixels attributes that you build up prior to applying your lighting passes.

Yeah, I didn't think there would be, and obviously it'd increase image quality a lot. All I meant was that SGX's depth buffer being 32-bit float internally gives it an advantage over a 24-bit depth buffer.

The quality advantage isn't that big on SGX as we always render at 32 bit (or higher) irrespective of the external target bit depth and just do a single dither when we write out a the end of tile.

Would take with some salt, but if this paste is correct:

http://pastie.org/1254872

Then it looks like Tegra 2 may in fact not support 24-bit or greater depth buffers. nVidia does have a non-linear depth extension to try to counter this, but it's still 16-bit.

While we're on the topic of image quality capabilities, Tegra 2 does have anisotropic filtering which SGX does not.

True, although I'd argue that basic rendering quality (24 bit FB/Z) was more important than aniso

Aniso also isn't in the ES2.0 API for other reasons afaik...

Exophase · Dec 23, 2010

JohnH said:
The hardware is designed to handle changes in render target efficiently so I don't see that as an issue, although as you say it's not really a benifit for us (at least not how it's expressed in the standard API). G-Buffer is one of the alternative terms used to describe the buffers containing the per pixels attributes that you build up prior to applying your lighting passes.

But can it support those render target changes while keeping the tile memory all on-chip? Or does it depend on how the IP is configured, same as with on-chip MRT support? And if it can, does it know not to have to export each render target when they're render-to-texture and only used as inputs for a later shader? Otherwise this approach will be more expensive. In fact, even if it can do all that, it'll still have to run two fragment shaders on each pixel, which I imagine will be less efficient than one shader, if only for passing parameters between the two.

JohnH said:
The quality advantage isn't that big on SGX as we always render at 32 bit (or higher) irrespective of the external target bit depth and just do a single dither when we write out a the end of tile.

Honestly I'd have to see some examples of true 24bpp vs dithered down to 16bpp, it has been a really long time since anyone was doing that. Although SGX would still have better image quality than the old ditherers did since it keeps more internal state in this precision, ie between alpha blends and multi-pass.

One question - since you mention > 32bit tile color buffers, is it really actually 32bit to begin with? Or is it lowp x4, which should actually be 40bit? Since that just gives you a range of +/- 2 instead of 0 to 1 the actual output would be clamped to 32bit anyway, but that gives you some room for not saturating in intermediate steps.

JohnH said:
True, although I'd argue that basic rendering quality (24 bit FB/Z) was more important than aniso Aniso also isn't in the ES2.0 API for other reasons afaik...

Well, they do have 24bpp framebuffer at least. It's not a Voodoo 3 ^^

JohnH · Dec 24, 2010

Exophase said:
But can it support those render target changes while keeping the tile memory all on-chip? Or does it depend on how the IP is configured, same as with on-chip MRT support? And if it can, does it know not to have to export each render target when they're render-to-texture and only used as inputs for a later shader? Otherwise this approach will be more expensive. In fact, even if it can do all that, it'll still have to run two fragment shaders on each pixel, which I imagine will be less efficient than one shader, if only for passing parameters between the two.

Not sure I understand what you're suggesting. Like all other archiectures if you reference a render target as a texture then you need to flush the render so that it can be used, if there's no reference we don't need to flush. In terms of MRTs there is no additional cost, they behave exactly as you'd expect on any other architecture except you get he usual tile based benefits of no external memory bandwidth for read/writes to them during rendering except write at end of tile. There's other tricks that can be played but they're not for discussion here. Basically most techniques that work well for other architectures work well for us, they may not be necessary or provide no additional benefit, but they certainly aren't worse on us.

John.

Simon F · Dec 24, 2010

Exophase said:
Honestly I'd have to see some examples of true 24bpp vs dithered down to 16bpp, it has been a really long time since anyone was doing that.

/me feels like I've fallen into that time warp/worm hole mfa mentioned.

There must be some historical discussion of this somewhere, eg. comparing 3dfx and pcx1/2

Lazy8s · Dec 24, 2010

http://www.sharkyextreme.com/hardware/articles/kyro_in-depth/4.shtml

Exophase · Dec 24, 2010

That's an example of 16-bit intermediate storage with rendering artifacts vs 32-bit intermediate storage with 16-bit dithered output, rather than dithered vs full 32-bit output... but still pretty illustrative of the image quality JohnH describes. I would think there was more to the push to 32-bit in the late 90s than artifacts from multi-pass rendering and alpha blending, but maybe that was a big part of it.

Would also be interesting to see how much of a difference there is between 32-bit internal with and without dithered output.

Simon F · Dec 24, 2010

Lazy8s - that's excellent.

BTW, I see, however, that they got http://www.sharkyextreme.com/hardware/articles/kyro_in-depth/6.shtml completely wrong

Exophase said:
That's an example of 16-bit intermediate storage with rendering artifacts vs 32-bit intermediate storage with 16-bit dithered output, rather than dithered vs full 32-bit output...

You have completely lost me.

JohnH · Dec 24, 2010

Simon F said:
You have completely lost me.

I'm assuming exophase means a side by side comparision of 32 bit dithered to 16 bits vs straight 32 bit.

John.

Xmas · Dec 24, 2010

Exophase said:
One question - since you mention > 32bit tile color buffers, is it really actually 32bit to begin with? Or is it lowp x4, which should actually be 40bit? Since that just gives you a range of +/- 2 instead of 0 to 1 the actual output would be clamped to 32bit anyway, but that gives you some room for not saturating in intermediate steps.

Clamping to [0, 1] is required by the OpenGL ES spec.

Ailuros said:
Finally 16bit z this and that, does anyone know why none of the Tegra2 devices I've seen so far (someone correct me if anything has changed in the meantime) aren't giving any MSAA results in GL benchmark? I can see NV's own extension for coverage sampling but none for multisampling.

If you look at darkblu's list of EGL configs, you won't find any which support multisampling (EGL_SAMPLES/EGL_SAMPLE_BUFFERS).

darkblu said:
That was not exposed up until recently (on the ipad).

iOS 4 added a number of extensions (including multisampling [as iOS doesn't use EGL], 4:2:2 textures, depth textures and float textures IIRC).

Loewe · Dec 27, 2010

Exophase said:
Honestly I'd have to see some examples of true 24bpp vs dithered down to 16bpp, it has been a really long time since anyone was doing that. Although SGX would still have better image quality than the old ditherers did ...

Ok, here it is.
I have made such a comparision in november 2000 in my Vivid review, you can find it here: http://www.mitrax.de/?cont=artikel&aid=34&page=11 .
The both SeSa shots in 16 bit (left) and 32 bit (right). Unfortunately both are jpeg files.

Here are two new shots I have made today on my MSI X320. The game is SeSatSE. The whole shots are here: 32bit: http://www.mitrax.de/files/sesa2010_32.png and 16bit: http://www.mitrax.de/files/sesa2010_16.png.

Here the bottom right corner:
32bit

16bit

If I rember correct, I don't think that the image quality is much better with SGX as KYRO was.
regards

Exophase · Dec 27, 2010

Wow, thanks.

In that game at least the difference is extremely subtle. If things are bandwidth limited and could improve from going to 16-bit I'd definitely choose that.

I imagine something has to show the difference more, but for now I'd have to agree with JohnH's claim that it's really minor. Hardly something you'd expect..

I do wonder if it gets worse at really low resolutions, like 320x240 (hey, I guess some devices still use that)

Kaotik · Dec 28, 2010

With all the 16bit rendering, one should always use 3dfx as reference, with their postfilters the quality was easily unmatched

Exophase · Dec 28, 2010

But 3dfx didn't have multipass with internal 32-bit, so IMG has the advantage there. If you have a single pass w/o alpha blending it should be the same (it does at least have shading + multitexture blending in full precision)

Loewe · Dec 28, 2010

Exophase said:
Wow, thanks.

In that game at least the difference is extremely subtle. If things are bandwidth limited and could improve from going to 16-bit I'd definitely choose that.
I imagine something has to show the difference more, but for now I'd have to agree with JohnH's claim that it's really minor. Hardly something you'd expect..
I do wonder if it gets worse at really low resolutions, like 320x240 (hey, I guess some devices still use that)

No problem, I am very interested in PowerVRs graphic!

If I had a driver that works with more games, we could test more. But this "driver" from intel is crap!
Ask the guys from imgtec, they have the driver to do this. ;-)

But I am sure you will see the same result on all games.
Here are two other shots. The game is now UT, yes the good old Unreal Tournament.
The shots are here: 32bit: http://www.mitrax.de/files/ut_32.png and 16bit: http://www.mitrax.de/files/ut_16.png.

The only game with lower resolutions is Q3A. But this is an OGL game and intel don't support OGL. The MS wrapper is doing weird things, so I thing the wrapper is ever using 16 bit.
regards

Lazy8s · Jan 11, 2011

The list for 2.0 GLBenchmark is starting to fill out.

No longer just a display case at a string of trade shows, an army of not-so-name-brand devices powered by Tegra 2 are showing up. They're doing quite well in the triangle rate tests, actually.

Some newer devices and platforms are previewed, though some may be facing SGXMP products around the time they come to market.

A Mali 200 is paired with an ARM11 in Telechips's TCC8900 processor, powering the Blueberry NetCat M-01.

The scores for the individual performance tests are interesting. The iPhone 4 still holds close to the top for texel fill and leads in the fixed time benchmarks. Resolution aside, the top spot for most of the other tests was a battle between a device powered by the upgraded Hummingbird, the immodestly titled Herotab MID816, and a raft of Adreno 205ers usually led by the Acer Liquid Metal.

Lazy8s · Jan 13, 2011

Samsung's trying to push the high end of smartphones again with their update to the Galaxy S line, releasing for AT&T as the Infuse 4G.

They've created a new evolution of their display tech, Super AMOLED Plus this time, with a much needed increase to sub-pixel resolution and better readability and visibility, and they're upping the ante by applying it to a humungous 4.5" screen.

The Infuse is powered by the updated Hummingbird platform with 1.2 GHz A8 and "25% faster graphics performance".

The fastest platform in the majority of the individual shader tests for GLBenchmark 2.0 is a pretty even split between the new Hummingbird and the new Snapdragons with Adreno 205. Tegra 2 dominates the triangle rate tests and scores highly in the overall framerate performance.

http://www.glbenchmark.com/compare....ocity A7&D4=Marvell Armada SmartPhone 800x480

erek · Jan 23, 2011

Lazy8s said:
Samsung's trying to push the high end of smartphones again with their update to the Galaxy S line, releasing for AT&T as the Infuse 4G.

They've created a new evolution of their display tech, Super AMOLED Plus this time, with a much needed increase to sub-pixel resolution and better readability and visibility, and they're upping the ante by applying it to a humungous 4.5" screen.

The Infuse is powered by the updated Hummingbird platform with 1.2 GHz A8 and "25% faster graphics performance".

The fastest platform in the majority of the individual shader tests for GLBenchmark 2.0 is a pretty even split between the new Hummingbird and the new Snapdragons with Adreno 205. Tegra 2 dominates the triangle rate tests and scores highly in the overall framerate performance.

http://www.glbenchmark.com/compare....ocity A7&D4=Marvell Armada SmartPhone 800x480

that Vivante Corporation is definitely providing for some really interesting performance results

rbaker · Jan 28, 2011

GLbenchmark on Windows?

Does anyone know if the GLBenchmark 2.0 tests are available on Windows?

I would like to run these on my PC now that ATI and NVIDIA have OpenGL ES 2.0 drivers to gauge the relative performance of the devices listed on GLBenchmark's site.

I would also like to compare the MSAA quality versus the desktop GPUs.

rpg.314 · Jan 28, 2011

rbaker said:
Does anyone know if the GLBenchmark 2.0 tests are available on Windows?

I would like to run these on my PC now that ATI and NVIDIA have OpenGL ES 2.0 drivers to gauge the relative performance of the devices listed on GLBenchmark's site.

I would also like to compare the MSAA quality versus the desktop GPUs.

Seems kinda pointless. The desktop brethren will likely be an order of magnitude or so faster.

rbaker · Jan 29, 2011

rpg.314 said:
Seems kinda pointless. The desktop brethren will likely be an order of magnitude or so faster.

Of course the performance should be an order of magnitude higher (as is the power consumption). However, performance differences aside it may shed some light on what the tests are doing in a grown up pipeline for those of us who do not have access to the source code.

GLBenchmark 2.0 Results

JohnH

Exophase

JohnH

Simon F

Tea maker

Lazy8s

Exophase

Simon F

Tea maker

JohnH

Xmas

Porous

Loewe

Exophase

Kaotik

Drunk Member

Exophase

Loewe

Lazy8s

Lazy8s

erek

rbaker

rpg.314

rbaker

Similar threads