Introduction to SGX

mczak · Nov 3, 2009

argor said:
jep i should haved been clearer on what i meant "chip falls back to software vertex shading "

This is however independent of the firmware, the driver may or may not use software vertex shading regardless (and hardware vertex shading may not always be a win, well it would if the cpu paired with wouldn't have a FPU...).

Loewe · Nov 8, 2009

mczak said:
The article is incorrect that the intel driver is not using TBDR at all.

I have not sayed that the intel driver do not use TBDR, I have sayed:

Der SGX535 verhält sich unter Verwendung des Tungsten Treibers nicht wie ein deferred Renderer, als ob sie das HSR nicht verwenden würden. ...
Es sieht aber so aus, als ob hier die Hardware nicht optimal genutzt wird.

I can't give you a good translation, but let us say it should be read as "it seems that they not using it correct".
I know that there is a better driver today, if only for Linux, I don't know. The newer driver is better, let us say it have the niveau from the PowerVR driver, but realy good is that not!
I am sure, the SGX core can do more in this old games.

mczak said:
I suspect why archmark delivers those inflated scores with the powervr driver is probably that the detection when it's really necessary to actually start rendering is a bit more clever, but it probably makes little difference in non-theoretical benchmarks.

Give me the new driver and we will see.

mczak said:
Also, the article states it's surprising to see hardware tnl being slower (or at least not faster) than software tnl. This is not surprising at all, why do you think intel switches between sw/hw tnl on their i965-based chips (sure cpus used there are faster but so are the igps)... Also, hw tnl in d3d doesn't necessarily mean it's actually performed in hw and not just in the driver...

Ok, it is clear that hardware T&L must not be faster.
The shader unit from this SGX core is rather smal and so it is important to use it clever. The guys at imgtec have a long tradition in design T&L units and I am sure that they know what they do. AFAIK they say that the SGX530 is well ballanced and the SGX535 has a second texture unit as the importent difference. So I don't think that the shader unit is cabable to do the T&L.

mczak said:
There seemed to be some other technical inaccuracies in the article, but I can't comment on that .

It's a pity that you can't help me here. I am a layman regarding computer grafic, but I try to learn where I can.

Let us hope that we will see anitime a good working driver for the GMA500.

mczak · Nov 9, 2009

Loewe said:
Ok, it is clear that hardware T&L must not be faster.
The shader unit from this SGX core is rather smal and so it is important to use it clever. The guys at imgtec have a long tradition in design T&L units and I am sure that they know what they do. AFAIK they say that the SGX530 is well ballanced and the SGX535 has a second texture unit as the importent difference. So I don't think that the shader unit is cabable to do the T&L.

Don't forget SGX may also be coupled with cpus which have very slow (or no) fpu at all, so being able to do hw tnl on the gpu is a must for the chip. Should also use less power even in case your cpu could potentially do it faster.

It's a pity that you can't help me here. I am a layman regarding computer grafic, but I try to learn where I can.

Oh, I wasn't referring to any bogus terms per se, just that some details don't really seem to fit the chip very well. But even powervr employees here don't comment on such things...

Loewe · Nov 11, 2009

mczak said:
Don't forget SGX may also be coupled with cpus which have very slow (or no) fpu at all, so being able to do hw tnl on the gpu is a must for the chip. Should also use less power even in case your cpu could potentially do it faster.

However, the SGX must do the T&L. If you use the PowerVR driver it can do it fast, see the results from Q3A, SeSa and OGLVillageMark. AFAIK is there no special T&L detection in the PowerVR OGL driver and this driver run significantly faster.
The D3D driver from intel use the same codebase as the D3D driver from PowerVR and this codebase is not be great. On the other site is Vista sending everything down to the SGX as full f32 data and so the fillrate goes down to one quarter!
Let me say it again, with an good driver this core should be outgunning KYROII in most cases.
I know that this is not possible in newer games or in the new 3DMark tests, the shader load is to big for this little core.

I am not to optimistic, but let us hope that Intel will give us a good, must not be great (but why not?), working driver, also for D3D and OpenGL!

mczak said:
Oh, I wasn't referring to any bogus terms per se, just that some details don't really seem to fit the chip very well. But even powervr employees here don't comment on such things...

I don't know what you refer to. All details regarding the 3D core in my last article are from the Intel SCH manual.

Arun · Nov 11, 2009

This reminds me - SGX is capable of 1xFP32 or 2xFP16 or 4xINT8. This makes perfect sense for OGL ES 2.0 given its specs (which themselves were surely influenced by IMG) but does anyone know if that last one is actually enough for DX8 PS1.1? All DX8 GPUs ever released had at least 9-bit ALUs iirc (that was NV, ATI/Matrox had more and Rampage would have had more too, no idea about VIA/SiS). And I think PS1.4 requires more mantissa bits than FP16 has at least (although iirc that didn't stop NVIDIA in the NV3x era).

Regarding using the CPU for T&L: remember Atom comes in different variants. What may not be a bottleneck in 1.6GHz netbooks might be one in 800MHz MIDs (iirc some of those at least have SGX clocked similarly)... So nothing is quite that simple.

Ailuros · Nov 11, 2009

Who cares anyway? Would Intel accept a fully working new D3D driver from IMG? I severely doubt it. It's too damn convenient to mark it as inefficient and at the same time overlook your own inefficiencies in your own neck of woods

Loewe · Nov 11, 2009

Arun said:
So nothing is quite that simple.

I have never said that it will be simple!
That what they doing with the actuell driver is simple! Take ever the biggest format and you can't do wrong!
You can do this, if you have millions over millions of transistors and it is not important how much power you need. But if you must calculate every transitor and efficiency is the only thing thats helps, than you need not only an efficiency core, you need also an efficiency driver.

JohnH · Nov 12, 2009

Arun said:
This reminds me - SGX is capable of 1xFP32 or 2xFP16 or 4xINT8. This makes perfect sense for OGL ES 2.0 given its specs (which themselves were surely influenced by IMG) but does anyone know if that last one is actually enough for DX8 PS1.1? All DX8 GPUs ever released had at least 9-bit ALUs iirc (that was NV, ATI/Matrox had more and Rampage would have had more too, no idea about VIA/SiS). And I think PS1.4 requires more mantissa bits than FP16 has at least (although iirc that didn't stop NVIDIA in the NV3x era).

SGX also supports 4xFixed point 10 bit as per OGLES2.0 LOWP so if the OS chose to expose the original shaders to the driver it would have the opportunity use it.

Note that the numbers you quote above are for a single pipeline, and FP32 is actually up to 2x ops per clock per pipe.

Regarding using the CPU for T&L: remember Atom comes in different variants. What may not be a bottleneck in 1.6GHz netbooks might be one in 800MHz MIDs (iirc some of those at least have SGX clocked similarly)... So nothing is quite that simple.

The reality is even more complicated than this, use of the CPU for TnL duty results in its caches be blasted by geometry data on every frame, so although it may appear faster in simplistic cases it can fall down in real applications. Further to this, I think there is something else going on in the D3D case, as is clearly evidenced by the much better OpenGL performance (with IMG drivers).

Cheers,
John.

darkblu · Nov 13, 2009

Arun said:
And I think PS1.4 requires more mantissa bits than FP16 has at least (although iirc that didn't stop NVIDIA in the NV3x era).

ps1.4 is fx12, so yes, 11 bits of mantissa would not cover it.

Introduction to SGX

mczak

Loewe

mczak

Loewe

Arun

Unknown.

Ailuros

Epsilon plus three

Loewe

JohnH

darkblu

Similar threads