3D Labs P10 Exposing TSMC's Problems

David G.

Newcomer
I've just red Extremetech's review about the 3D Labs VP870 card , based on the new VPU , the P10 .
The thing is that I am amaized how a full programable VPU gets a beating from a GPU like Quadro 4 750 .

It's imposible in IMHO that a chip that is OpenGL 2.0 capable and DX9 ready to get a whooping from a GF4 class card that has "only" 128 Bit memory BUS .

Sure the drivers are imature and , in time things will get better for the VP870 and the card will be probably 10% overall faster than the 750 but the only way I can explain all of these is that TSMC is experiencing the same problems here as they are with Matrox's Parhelia .

I think that the P870 has a similar 200-240 Mhz speed as Parheila and that's why it's slower than Quadro 4 .
 
Generally, the more programable something is, the slower it is. Not always, but that is generally the case with graphics hardware. Give them some time to optimize the drivers. That will help some.
 
David G. said:
I've just red Extremetech's review about the 3D Labs VP870 card , based on the new VPU , the P10 .
The thing is that I am amaized how a full programable VPU gets a beating from a GPU like Quadro 4 750 .

It's imposible in IMHO that a chip that is OpenGL 2.0 capable and DX9 ready to get a whooping from a GF4 class card that has "only" 128 Bit memory BUS .
How programmable something is has no direct relation to its speed. More important is how fast you can execute the required instructions.
 
OpenGL guy said:
How programmable something is has no direct relation to its speed. More important is how fast you can execute the required instructions.

Granted, it isn't directly related, but it can have an impact. If you have a highly flexible pipeline vs. a fixed, it is generally safe to say that the fixed will be faster (assuming your engineers are all about equal).
 
Dave said:
OpenGL guy said:
How programmable something is has no direct relation to its speed. More important is how fast you can execute the required instructions.

Granted, it isn't directly related, but it can have an impact. If you have a highly flexible pipeline vs. a fixed, it is generally safe to say that the fixed will be faster (assuming your engineers are all about equal).
Yeah, both are right :)
 
That extremetech article was exceptionally poor. They didn't elaborate on what the settings were and there were many questionable testing methods, IIRC.
 
David G. said:
Sure the drivers are imature and , in time things will get better for the VP870 and the card will be probably 10% overall faster than the 750 but the only way I can explain all of these is that TSMC is experiencing the same problems here as they are with Matrox's Parhelia.

Parhelia is manufactured at UMC. There was a press release about it after the launch.
 
Dave said:
Granted, it isn't directly related, but it can have an impact. If you have a highly flexible pipeline vs. a fixed, it is generally safe to say that the fixed will be faster (assuming your engineers are all about equal).

Actually, wouldn't you be better off saying that with a constant (equal) transistor count, a programmable architecture is inheriently slower than a fixed function one. Otherwise, as OGLGuy elluded to you can just use concurrency - and it's linear increase - to increase the power of a "flexible pipeline"
 
If I ever get to finish my vertex shader article I can give some indication of why programmable flexible hardware has many more pitfalls and issues than a largely fixed function pipeline...

The general concept of "the more flexbible the slower it is" holds true since greater flexibility creates more potential hazards and conflicts, not to mention more processing (decoding and fetching instructions, passing them to the actual processor, etc...). Driver optimisation is the key... although initial hardware design also has a big impact on how good it can get.

K-
 
How programmable something is has no direct relation to its speed. More important is how fast you can execute the required instructions.
Java and Transmeta are two living arguments to the contrary. While I don't know the exact technical reasons why, it's understandable that a virtual machine would be slower than a hardwired one because of the extra lookups/decodes it has to perform. Jack of all trades, and all that. Obviously if the virtual machine consisted of twice the transistors of the hardwired one, and ran at twice the clockspeed, it'd likely be faster overall. The assumption is most/all things being equal.

There's no need to nick pits, here. ;)
 
again the problem with P10 might be occlusion detection... I thought they had early-Z checks, but In don't know if they cna have it on all the time and how many pixels/trangles it can test each clock...

this is important nowadays since you waste several cycles on pixel shaders executed on hidden pixels...

also P10 doesn't employ a cross-bar memory controller, but uses a fat bus to push data into caches from which the execution begins... these caches might hold back quite a lot of efficiency and thus performance and the on-chip caching system might need a revision...
 
Kristof said:
The general concept of "the more flexbible the slower it is" holds true



In a way ... this reminds me of Intel's P4 architecture ...... highly programable and software optimisations dependant but really slow .

That kinda gives me and ideea ....
 
Just wondering, how does this 'expose TSMCs problem with .13'?

Or did the article yap about that (I didn't read it)
 
Why is the P10 chip , which is more capable and complex than a GF4 slower when fillrate comes into buissiness ? Because of a low clock speed .
 
So you're saying that Parhelia's slow clock speed is proof positive that TSMC is having trouble with .13?
 
Pete said:
How programmable something is has no direct relation to its speed. More important is how fast you can execute the required instructions.
Java and Transmeta are two living arguments to the contrary. While I don't know the exact technical reasons why, it's understandable that a virtual machine would be slower than a hardwired one because of the extra lookups/decodes it has to perform. Jack of all trades, and all that. Obviously if the virtual machine consisted of twice the transistors of the hardwired one, and ran at twice the clockspeed, it'd likely be faster overall. The assumption is most/all things being equal.

There's no need to nick pits, here. ;)
How is this a counter example to what I said? What's important is how fast you can execute the desired operations. Look at the P4 vs. Athlon. Clock per clock, the P4 is generally slower, yet it can be clocked significantly higher, so it wins overall.

As I said above, it doesn't really matter how programmable your design is, what matters is how fast you can do what you are asked to do.
 
RussSchultz said:
So you're saying that Parhelia's slow clock speed is proof positive that TSMC is having trouble with .13?

Parhelia is made in 0.15 micron technology but the situation sure shows how TSMC is unable to manufacture high speed cards with big transistors numbers .... It's jut a guess but I think they really have problems if nVIDIA cut it's wafer requests by 75% .
 
Parhelia is made in 0.15 micron technology but the situation sure shows how TSMC is unable to manufacture high speed cards with big transistors numbers

I think its already been pointed out that Parhelia is manufactured by UMC. Also, its highly likely that R300 is manufactured by TSMC and thats more complex and ATi are aiming for much faster!

Its more likely the design that influences the speed, not necessarily the chip manufacturer.
 
DaveBaumann said:
I think its already been pointed out that Parhelia is manufactured by UMC.

.... what ? Man ... how stupid can I be ....

The last time I was brifed by the Matrox guys .... 5 weeks ago , I left with the impresion that TSMC was their manufacturer ....
 
Back
Top