About Kyro2 SE (aka STG4800) EnT&L

Simon82 · Jun 25, 2008

Do you think that PowerVR lied when they thoke they could use a mixed hw and sw T&L to achieve more frames than a real hw T&L? Was it similar to the one you can use on Direct3D T&L software emulation?

Bye

Scali · Jun 25, 2008

They could be telling the truth... but that depends on a lot of factors.
What do they mean by 'mixed'?
And what do they mean by 'real hardware T&L'?

For example, I have a laptop with X3100, which is faster with software T&L in some applications and faster with hardware T&L in others. The driver comes with profiles for popular applications, which decide whether to use software T&L or hardware T&L.

You could call this 'mixed' T&L, and it is indeed faster than running everything in hardware T&L (that's pretty much a given, since they only use software when it's faster in the first place). However, it's not faster than full hardware T&L solutions of other companies.

I would say that with a sufficiently fast card, there is no way that software T&L is faster than hardware T&L (for regular D3D operations that is, I'm not talking about trying to do things that regular D3D doesn't support and you somehow have to hack with streaming new data into the vertexbuffers all the time).

Back in the day of the Kyro2 I had an 1800+ CPU and a GeForce2 GTS. For most scenes, the GF2 was oodles faster than my CPU (which was one of the fastest on the market at that time). In fact, the GF2 was so fast that even when doing things that couldn't be done with hardware T&L alone, I'd still only offload the necessary operations to the CPU (eg skinning, or setting up dot3 bumpmapping etc), and let the GPU perform transform and lighting, because it'd be much faster than letting the CPU light the vertices aswell.

So no, with a proper hw T&L implementation like the GF2GTS had at the time, there's no way that offloading some operations to the CPU would be faster.
The only thing that *might* be faster is if you let the CPU and GPU work in parallel... however that will be hard to pull off, both in terms of getting good concurrency and getting consistent results. I also doubt that you'd gain a lot from that, considering the GF2GTS was really orders of magnitude faster than the CPU at most T&L tasks. Some scenes could go from tens of thousands of vertices to millions of vertices because of the power of hardware T&L. The gap has only widened since.

Simon82 · Jun 25, 2008

Scali said:
They could be telling the truth... but that depends on a lot of factors.
What do they mean by 'mixed'?
And what do they mean by 'real hardware T&L'?

For example, I have a laptop with X3100, which is faster with software T&L in some applications and faster with hardware T&L in others. The driver comes with profiles for popular applications, which decide whether to use software T&L or hardware T&L.

You could call this 'mixed' T&L, and it is indeed faster than running everything in hardware T&L (that's pretty much a given, since they only use software when it's faster in the first place). However, it's not faster than full hardware T&L solutions of other companies.

I would say that with a sufficiently fast card, there is no way that software T&L is faster than hardware T&L (for regular D3D operations that is, I'm not talking about trying to do things that regular D3D doesn't support and you somehow have to hack with streaming new data into the vertexbuffers all the time).

Back in the day of the Kyro2 I had an 1800+ CPU and a GeForce2 GTS. For most scenes, the GF2 was oodles faster than my CPU (which was one of the fastest on the market at that time). In fact, the GF2 was so fast that even when doing things that couldn't be done with hardware T&L alone, I'd still only offload the necessary operations to the CPU (eg skinning, or setting up dot3 bumpmapping etc), and let the GPU perform transform and lighting, because it'd be much faster than letting the CPU light the vertices aswell.

So no, with a proper hw T&L implementation like the GF2GTS had at the time, there's no way that offloading some operations to the CPU would be faster.
The only thing that *might* be faster is if you let the CPU and GPU work in parallel... however that will be hard to pull off, both in terms of getting good concurrency and getting consistent results. I also doubt that you'd gain a lot from that, considering the GF2GTS was really orders of magnitude faster than the CPU at most T&L tasks. Some scenes could go from tens of thousands of vertices to millions of vertices because of the power of hardware T&L. The gap has only widened since.

Thanks for this reply. I've understood perfectly your point of view. Surely the idea taken by the p.o.v you described would probably be smart but I think STM was only trying to relaunch something that was surely a good idea but not without some technical innovation capable to compare with the other too much powerful solution.
I think that Kyro technology let others company understand that sometime to make things in a smart and accurate way is better that "simply" increase standard parameters to achieve more power. In that time the fill rate differences of NV20 and STG4500 was ridicolous and maybe it's for this reason there was so many bad and obscure voice coming out Nvidia probably having fear of future never seen solution.

Simon F · Jun 25, 2008

Simon82 said:
Do you think that PowerVR lied when they thoke they could use a mixed hw and sw T&L to achieve more frames than a real hw T&L?
Bye

Kyro had no T&L unit so therefore it was simply not possible to have "mixed" HW&SW T&L. What the driver could do (via a registry flag) was to tell the application that there "was" a HW T&L unit because some programmers/applications were pig headed and insisted that their program required HW T&L, which was patently not the case.

Simon82 · Jun 25, 2008

Simon F said:
Kyro had no T&L unit so therefore it was simply not possible to have "mixed" HW&SW T&L. What the driver could do (via a registry flag) was to tell the application that there "was" a HW T&L unit because some programmers/applications were pig headed and insisted that their program required HW T&L, which was patently not the case.

So there was no other secret involved in this "(not)emulation"? A trick that also other program used in that time (3danalyzer) was used to try T&L-required games by Directx6 based card? I clearly remember PowerVR/STM people using "mixed" word when they was interviewed...

Davros · Jun 25, 2008

but wasnt the thing that the kryo2 didnt have any hardware T&L at all
they maybe had their own tnl routine in the drivers ( rather than use d3d's routine) which looked like hardware T&L to the apps
the problem was not all apps recognised this as hardware T&L so you had to use a tool like 3d analyse
iirc

ps: a few posts have been added while i typed this so it looks a bit like im stating the obvious

Simon82 · Jun 25, 2008

Davros said:
but wasnt the thing that the kryo2 didnt have any hardware T&L at all
they maybe had their own tnl routine in the drivers ( rather than use d3d's routine) which looked like hardware T&L to the apps
the problem was not all apps recognised this as hardware T&L so you had to use a tool like 3d analyse
iirc

Maybe they thoke that using this kind of stuff they probably would fight the power of the other cards using the upcoming power of the even more powerful CPU. But the capabilities of the upcoming NV20 programmable shaders kill immediately the limitations and these hopes of the hold D3D7.x hardware.

Davros · Jun 25, 2008

Actually i dont think so, I think kryo 2 lacking a T&L unit was more due to price considerations, powervr never aimed at the high end, one of the selling points of the card (to oems at least) is that the efficiency of the chip meant you didnt need to use expensive, high end memory to keep up with the competition, therfore the boards could retail for less.

ps: whats "thoke"

mczak · Jun 25, 2008

For what it's worth, AMD IGPs (up to the ones derived from the unified r6xx, for obvious reasons) never had any sort of hw tnl / vertex shaders neither, but used driver emulation (instead of relying on the d3d sw vertex pipeline). I don't think this really hurt much in terms of performance (with the igps being not so fast anyway).

Scali · Jun 25, 2008

mczak said:
For what it's worth, AMD IGPs (up to the ones derived from the unified r6xx, for obvious reasons) never had any sort of hw tnl / vertex shaders neither, but used driver emulation (instead of relying on the d3d sw vertex pipeline).

I hope Nick is reading this

Simon82 · Jun 25, 2008

Davros said:
Actually i dont think so, I think kryo 2 lacking a T&L unit was more due to price considerations, powervr never aimed at the high end, one of the selling points of the card (to oems at least) is that the efficiency of the chip meant you didnt need to use expensive, high end memory to keep up with the competition, therfore the boards could retail for less.

ps: whats "thoke"

But surely they got two birds with a single bullet. The Kyro2 was indeed as powerful as other card regarding games. Surely it could not perform equally in that title requiring a powerful polygon engine but it had lot of exceptional features. I remember the True color internal rendering that surpass every other card regarding quality of the 16bit only games. Thief1/2 was outstanding.

About Kyro2 SE (aka STG4800) EnT&L

Simon82

Scali

Simon82

Simon F

Tea maker

Simon82

Davros

Simon82

Davros

mczak

Scali

Simon82

Similar threads