They could be telling the truth... but that depends on a lot of factors.
What do they mean by 'mixed'?
And what do they mean by 'real hardware T&L'?
For example, I have a laptop with X3100, which is faster with software T&L in some applications and faster with hardware T&L in others. The driver comes with profiles for popular applications, which decide whether to use software T&L or hardware T&L.
You could call this 'mixed' T&L, and it is indeed faster than running everything in hardware T&L (that's pretty much a given, since they only use software when it's faster in the first place). However, it's not faster than full hardware T&L solutions of other companies.
I would say that with a sufficiently fast card, there is no way that software T&L is faster than hardware T&L (for regular D3D operations that is, I'm not talking about trying to do things that regular D3D doesn't support and you somehow have to hack with streaming new data into the vertexbuffers all the time).
Back in the day of the Kyro2 I had an 1800+ CPU and a GeForce2 GTS. For most scenes, the GF2 was oodles faster than my CPU (which was one of the fastest on the market at that time). In fact, the GF2 was so fast that even when doing things that couldn't be done with hardware T&L alone, I'd still only offload the necessary operations to the CPU (eg skinning, or setting up dot3 bumpmapping etc), and let the GPU perform transform and lighting, because it'd be much faster than letting the CPU light the vertices aswell.
So no, with a proper hw T&L implementation like the GF2GTS had at the time, there's no way that offloading some operations to the CPU would be faster.
The only thing that *might* be faster is if you let the CPU and GPU work in parallel... however that will be hard to pull off, both in terms of getting good concurrency and getting consistent results. I also doubt that you'd gain a lot from that, considering the GF2GTS was really orders of magnitude faster than the CPU at most T&L tasks. Some scenes could go from tens of thousands of vertices to millions of vertices because of the power of hardware T&L. The gap has only widened since.