About Kyro2 SE (aka STG4800) EnT&L

Simon82

Newcomer
Do you think that PowerVR lied when they thoke they could use a mixed hw and sw T&L to achieve more frames than a real hw T&L? Was it similar to the one you can use on Direct3D T&L software emulation?

Bye
 
They could be telling the truth... but that depends on a lot of factors.
What do they mean by 'mixed'?
And what do they mean by 'real hardware T&L'?

For example, I have a laptop with X3100, which is faster with software T&L in some applications and faster with hardware T&L in others. The driver comes with profiles for popular applications, which decide whether to use software T&L or hardware T&L.

You could call this 'mixed' T&L, and it is indeed faster than running everything in hardware T&L (that's pretty much a given, since they only use software when it's faster in the first place). However, it's not faster than full hardware T&L solutions of other companies.

I would say that with a sufficiently fast card, there is no way that software T&L is faster than hardware T&L (for regular D3D operations that is, I'm not talking about trying to do things that regular D3D doesn't support and you somehow have to hack with streaming new data into the vertexbuffers all the time).

Back in the day of the Kyro2 I had an 1800+ CPU and a GeForce2 GTS. For most scenes, the GF2 was oodles faster than my CPU (which was one of the fastest on the market at that time). In fact, the GF2 was so fast that even when doing things that couldn't be done with hardware T&L alone, I'd still only offload the necessary operations to the CPU (eg skinning, or setting up dot3 bumpmapping etc), and let the GPU perform transform and lighting, because it'd be much faster than letting the CPU light the vertices aswell.

So no, with a proper hw T&L implementation like the GF2GTS had at the time, there's no way that offloading some operations to the CPU would be faster.
The only thing that *might* be faster is if you let the CPU and GPU work in parallel... however that will be hard to pull off, both in terms of getting good concurrency and getting consistent results. I also doubt that you'd gain a lot from that, considering the GF2GTS was really orders of magnitude faster than the CPU at most T&L tasks. Some scenes could go from tens of thousands of vertices to millions of vertices because of the power of hardware T&L. The gap has only widened since.
 
They could be telling the truth... but that depends on a lot of factors.
What do they mean by 'mixed'?
And what do they mean by 'real hardware T&L'?

For example, I have a laptop with X3100, which is faster with software T&L in some applications and faster with hardware T&L in others. The driver comes with profiles for popular applications, which decide whether to use software T&L or hardware T&L.

You could call this 'mixed' T&L, and it is indeed faster than running everything in hardware T&L (that's pretty much a given, since they only use software when it's faster in the first place). However, it's not faster than full hardware T&L solutions of other companies.

I would say that with a sufficiently fast card, there is no way that software T&L is faster than hardware T&L (for regular D3D operations that is, I'm not talking about trying to do things that regular D3D doesn't support and you somehow have to hack with streaming new data into the vertexbuffers all the time).

Back in the day of the Kyro2 I had an 1800+ CPU and a GeForce2 GTS. For most scenes, the GF2 was oodles faster than my CPU (which was one of the fastest on the market at that time). In fact, the GF2 was so fast that even when doing things that couldn't be done with hardware T&L alone, I'd still only offload the necessary operations to the CPU (eg skinning, or setting up dot3 bumpmapping etc), and let the GPU perform transform and lighting, because it'd be much faster than letting the CPU light the vertices aswell.

So no, with a proper hw T&L implementation like the GF2GTS had at the time, there's no way that offloading some operations to the CPU would be faster.
The only thing that *might* be faster is if you let the CPU and GPU work in parallel... however that will be hard to pull off, both in terms of getting good concurrency and getting consistent results. I also doubt that you'd gain a lot from that, considering the GF2GTS was really orders of magnitude faster than the CPU at most T&L tasks. Some scenes could go from tens of thousands of vertices to millions of vertices because of the power of hardware T&L. The gap has only widened since.
Thanks for this reply. I've understood perfectly your point of view. Surely the idea taken by the p.o.v you described would probably be smart but I think STM was only trying to relaunch something that was surely a good idea but not without some technical innovation capable to compare with the other too much powerful solution.
I think that Kyro technology let others company understand that sometime to make things in a smart and accurate way is better that "simply" increase standard parameters to achieve more power. In that time the fill rate differences of NV20 and STG4500 was ridicolous and maybe it's for this reason there was so many bad and obscure voice coming out Nvidia probably having fear of future never seen solution.
 
Do you think that PowerVR lied when they thoke they could use a mixed hw and sw T&L to achieve more frames than a real hw T&L?
Bye
Kyro had no T&L unit so therefore it was simply not possible to have "mixed" HW&SW T&L. What the driver could do (via a registry flag) was to tell the application that there "was" a HW T&L unit because some programmers/applications were pig headed and insisted that their program required HW T&L, which was patently not the case.
 
Kyro had no T&L unit so therefore it was simply not possible to have "mixed" HW&SW T&L. What the driver could do (via a registry flag) was to tell the application that there "was" a HW T&L unit because some programmers/applications were pig headed and insisted that their program required HW T&L, which was patently not the case.

So there was no other secret involved in this "(not)emulation"? A trick that also other program used in that time (3danalyzer) was used to try T&L-required games by Directx6 based card? I clearly remember PowerVR/STM people using "mixed" word when they was interviewed...
 
but wasnt the thing that the kryo2 didnt have any hardware T&L at all
they maybe had their own tnl routine in the drivers ( rather than use d3d's routine) which looked like hardware T&L to the apps
the problem was not all apps recognised this as hardware T&L so you had to use a tool like 3d analyse
iirc

ps: a few posts have been added while i typed this so it looks a bit like im stating the obvious ;)
 
but wasnt the thing that the kryo2 didnt have any hardware T&L at all
they maybe had their own tnl routine in the drivers ( rather than use d3d's routine) which looked like hardware T&L to the apps
the problem was not all apps recognised this as hardware T&L so you had to use a tool like 3d analyse
iirc

Maybe they thoke that using this kind of stuff they probably would fight the power of the other cards using the upcoming power of the even more powerful CPU. But the capabilities of the upcoming NV20 programmable shaders kill immediately the limitations and these hopes of the hold D3D7.x hardware.
 
Actually i dont think so, I think kryo 2 lacking a T&L unit was more due to price considerations, powervr never aimed at the high end, one of the selling points of the card (to oems at least) is that the efficiency of the chip meant you didnt need to use expensive, high end memory to keep up with the competition, therfore the boards could retail for less.

ps: whats "thoke"
 
For what it's worth, AMD IGPs (up to the ones derived from the unified r6xx, for obvious reasons) never had any sort of hw tnl / vertex shaders neither, but used driver emulation (instead of relying on the d3d sw vertex pipeline). I don't think this really hurt much in terms of performance (with the igps being not so fast anyway).
 
For what it's worth, AMD IGPs (up to the ones derived from the unified r6xx, for obvious reasons) never had any sort of hw tnl / vertex shaders neither, but used driver emulation (instead of relying on the d3d sw vertex pipeline).

I hope Nick is reading this ;)
 
Actually i dont think so, I think kryo 2 lacking a T&L unit was more due to price considerations, powervr never aimed at the high end, one of the selling points of the card (to oems at least) is that the efficiency of the chip meant you didnt need to use expensive, high end memory to keep up with the competition, therfore the boards could retail for less.

ps: whats "thoke"

But surely they got two birds with a single bullet. The Kyro2 was indeed as powerful as other card regarding games. Surely it could not perform equally in that title requiring a powerful polygon engine but it had lot of exceptional features. I remember the True color internal rendering that surpass every other card regarding quality of the 16bit only games. Thief1/2 was outstanding.
 
Back
Top