Poll on future dx9 card's ( not ati and nvidia ) performance

Best dx9.0 part

  • Via (deltachrome)

    Votes: 0 0.0%
  • PowerVr (series 5 )

    Votes: 0 0.0%
  • Other (Matrox, BitBoy)

    Votes: 0 0.0%
  • They wont be able to build a decent dx9 card

    Votes: 0 0.0%

  • Total voters
    153
But when there hasn't been a new product for so long, and little (any?) driver development, problems with some games is almost a certainty. That games are mostly developed with only nVidia and ATI hardware in mind, doesn't help PowerVR either.

All products upon release have their driver bugs until they get ironed out. If they should be more than on other products remains to be seen.

Driver development and internal testing shouldn´t have stopped since the early stages of development.

As for the last sentence (mind you it´s an old patent ) here´s something that might give a clue that it shouldn´t be much of consideration there:

http://l2.espacenet.com/espacenet/viewer?PN=WO03010717&CY=gb&LG=en&DB=EPD

So that it's not just "competitive" with high end IMRs, but beats them.

I´d set a higher priority for a high end dx9.0 part for arithmetic efficiency, but what the heck do I know.... :rolleyes:
 
duncan36 said:
Which Japanese company owns PowerVr by the way?

Sigh, PowerVR Technologies is a division of Imagination Technologies Ltd a UK company... we do have an office in Tokyo (Japan).

K-
 
Joe DeFuria said:
Uttar said:
Why would a TBDR need so high end memory :?

So that it's not just "competitive" with high end IMRs, but beats them.

Give me a TBDR with the same raw bandwidth as the latest IMR, and the same raw fill-rate of the latest IMR (which should mean approximately equal costs). Then, let the "effecitve fillrate" advantage beat the IMR.

Even ignoring overdraw, a TBDR needs substantially less BW than an IMR to support the same raw fillrate, i.e. a TBDR doesn't need to read the FB when alpha blending, has a fraction of the FB write BW requirement and has no Z buffer R/W bandwidth...

John.
 
JohnH said:
Even ignoring overdraw, a TBDR needs substantially less BW than an IMR to support the same raw fillrate, i.e. a TBDR doesn't need to read the FB when alpha blending, has a fraction of the FB write BW requirement and has no Z buffer R/W bandwidth...

John.

Whatver the exact case may be: give me a tiler that maxes out raw bandwidth, and then pairs it up with the raw fill-rate needed to utilize it.
 
i gave my vote for deltaChrome: that part looks to be a very well ballanced dx9 desktop contender, plus its expected availability is somewhere around this christmass, and moreover, latter seems very probable to me. i didn't vote for the next pvr part because i, for one, have no clue either what its capabilites are (except possibly for ps3/vs3) or when to expect it, therefore it may pretty much end up being a next-gen-dx product, both as featureset and timeframe.
 
Joe DeFuria said:
JohnH said:
Even ignoring overdraw, a TBDR needs substantially less BW than an IMR to support the same raw fillrate, i.e. a TBDR doesn't need to read the FB when alpha blending, has a fraction of the FB write BW requirement and has no Z buffer R/W bandwidth...

John.

Whatver the exact case may be: give me a tiler that maxes out raw bandwidth, and then pairs it up with the raw fill-rate needed to utilize it.

That seems like a bad, very bad strategy to me.
Sure, you might increase FPS a bit. But it'll cost you a LOT of money - money you could put elsewhere.
The problem is exactly that: You're using money somewhere, while other areas would benefit a LOT more from it. That can go from things like increasing raw power through transistor count to putting more money on marketing ( a required thing, even if it isn't too technical, considering how long it has been since PowerVR has done anything in this market ) and driver development.

It just seems like a bad approach to me. It's like if you had a horse, and you ponder: "Should I take that 100 pound crate with me when going to visit my borther riding my horse?" - Sure, you might have the advantage of having fifty billion thousand family pictures to your brother, which is certainly an advantage if he likes family pictures, but the horse will also be that much more tired, and you'd have so much less time at your brother's house that he won't even be able to see a tenth of your pictures.

Yes, that's very imaged, but your hopefully get the point :)



Uttar
 
Uttar said:
That seems like a bad, very bad strategy to me.
Sure, you might increase FPS a bit. But it'll cost you a LOT of money - money you could put elsewhere.

Why? Seems to work just fine for every other IHV. A "few more FPS" is the difference between being recognizes as the leader, and recognized as having a "fubared" architecture.

In any case, the theory goes, if a TBDR with 30 GB/Sec bandwidth, and enough the fill-rate to utilize it, is put against a IMR with the same bandwidth...there will be more than just a "few" FPS difference.

The high end designs are all built around memory bandwidth:

1) Figure out how much bandwidth you'll have available
2) Build silicon to balance that out. dedicate enough silicon to fill-rate, and to memory bandwidth savings techniques to make the most use of the available bandwidth, and dedicate the rest of the silicon to additional feature support.

The only way this could be "bad" for PowerVR, is if the architecture is sooo efficient with bandwidth, that they couldn't build a chip that could utilize (in terms of fill rate), such high memory bandwidth. In short: if memory bandwidth is NOT the key cost for deferred renderers (but fill rate is), then you would build an architecture around as much fill-rate as is reasonably possible, then pair it with enough bandwidth to satisfy it.

It's like if you had a horse, and you ponder: "Should I take that 100 pound crate with me when going to visit my borther riding my horse?" - Sure, you might have the advantage of having fifty billion thousand family pictures to your brother, which is certainly an advantage if he likes family pictures, but the horse will also be that much more tired, and you'd have so much less time at your brother's house that he won't even be able to see a tenth of your pictures.

Yes, that's very imaged, but your hopefully get the point :)

Actually, I don't really get the point. We're talking about a high-end product. No matter what the architecture is, we want it "balanced." That is, supply the memory bandwidth that is required by fill-rate demands. That balance may be different for a TBDR, but that doesn't mean you can't build an architecture with maximim bandiwdth in mind.


Uttar[/quote]
 
Joe: My point is that 30GB/s, with the current transistor limits, is NOT balanced on a TBDR.
What you say that they need high-end memory is, AFAIK ( and I might be wrong, I'm no TBDR specialist ) simply a bad idea, because it'd increase costs a LOT more than it'd increase performance - bandwidth is just that much less important on TBDRs.

So...
The only way this could be "bad" for PowerVR, is if the architecture is sooo efficient with bandwidth, that they couldn't build a chip that could utilize (in terms of fill rate), such high memory bandwidth. In short: if memory bandwidth is NOT the key cost for deferred renderers (but fill rate is), then you would build an architecture around as much fill-rate as is reasonably possible, then pair it with enough bandwidth to satisfy it.

That's true AFAIK. Heck, considering 48GB/s ( maximum, that's the top estimate, likely to be lower ) is balanced for a 150M NV40...
I doubt that number would even be balanced for a 250M Series 5, not that 250M transistors is really makable in such a timeframe anyway. So high-end memory for TBDRs is insane IMO. I'd say that 350Mhz DDR1 on a 128-bit memory bus would probably not be bandwidth limited on Series 5.

Could anyone more experienced in TBDR-related knowledge confirm or deny my sayings?


Uttar
 
Uttar said:
Joe: My point is that 30GB/s, with the current transistor limits, is NOT balanced on a TBDR.

I'm not convinced of that.

I'm not saying that may not be the case, of course. ;) In fact, I've argued to "PowerVR proponents" in the past that this may indeed be the case, and could be a real barrier to a high-end TBDR being anything but "competitive" with high-end IMRs. (It just runs into a different wall: fill-rate / transistors vs. bandwidth.)

Again, it's certainly a possibility, just not convinced it is a near certainty as you seem to suggest.

What you say that they need high-end memory is, AFAIK ( and I might be wrong, I'm no TBDR specialist ) simply a bad idea, because it'd increase costs a LOT more than it'd increase performance - bandwidth is just that much less important on TBDRs.

An increase in bandwidth, with a proportional increase in fill rate, will have the same performance impact as the same "proportional" increases in fill-rate / bandwidth for a IMR.

Could anyone more experienced in TBDR-related knowledge confirm or deny my sayings?

Unfortunately, I think they'd all be guessing. Though it might help to go back and dig-up Kyro II specs for fill-rate / bandwidth, and transistor count...
 
Joe DeFuria said:
Unfortunately, I think they'd all be guessing. Though it might help to go back and dig-up Kyro II specs for fill-rate / bandwidth, and transistor count...

Okay... Looking at:
http://freespace.virgin.net/neeyik.uk/3dspecs/

Kyro II: 15M transistors, 175Mhz core & memory clock ( with SDR )
TNT2 Ultra: 15M transistors, 150Mhz core & 183Mhz ( with SDR )

Thus, we got the following ratios:
Kyro II: 1.0
TNT2 Ultra: 1.22

This would thus result in the fact that if they wanted to use the same memory as the NV40 ( 700-800Mhz with a 256-bit memory bus ) and be as balanced ( or unbalanced, we'll see ) as it, they'd need 183M transistors with a 550-600Mhz core clock - and such a 183M transistor figure, while probably possible, would put the manufacturing cost way, way too high...

That is doing a few major mistakes, of course:
- Forgetting that current IMR got many very efficient bandwidth saving techniques
- Forgetting that the main use for that bandwidth in FSAA - at 2x MSAA, the NV35U and R350P are barely bandwidth limited.

And thus, as said before by many other people, the whole question about Series 5 is how good those IMR bandwidth saving techniques are ( very good, yes, but that doesn't say much ) and how cheap MSAA will be on Series 5.

But then again, should nVidia be *really* smart and that their FSAA algorithm "revamp" actually means an all new algorithm ( FAA? Who knows ) - then PowerVR would suddently look quite bad indeed. But frankly, I doubt that, and I think Series 5 is unlikely to look really bad.


Uttar
 
Uttar said:
Kyro II: 15M transistors, 175Mhz core & memory clock ( with SDR )
TNT2 Ultra: 15M transistors, 150Mhz core & 183Mhz ( with SDR )

Thus, we got the following ratios:
Kyro II: 1.0
TNT2 Ultra: 1.22

I see it quite a bit differently.

nVidia's "forumla" over the years has been (approimately): For every 2 texture pipelines, use 128 bits of SDR memory running at a clock rate comparable to the core to balance it. This is what TNT-2 uses. This is approximate, but holds remarkably true across across their high end products: TNT-2, all the way through the FX 5900.

FX's 5900's 8 "texture" pipes (4x2), demands "quaduple" the memory bendwidth compared to TNT-2, clock for clock. This is provided by doubling the bandwidth to 256 bit, and then doubling the throughput by going from SDR to DDR memory.

This is also the same with Kyro-II. For every 2 pipes, 128 bits of synchronous SDR memory.

Kyro-II has similar transistor counts compared to TNT-2, which both provide 2 DX6 style pixel pipelines.

So following this, I see no reason why there couldn't be an 8 pipeline DX9 TBDR, with a similar transistor count as FX 5900--110 million. And this should be paired with similar memory as the FX: roughly synchronous, 256 bit, DDR memory.

These are of course gross generalizations with very few data points to go on and with assumptions made of "all else being equal." ;)
 
Nappe1 said:
- XGI: no one knows exactly what they have on their sleeves... btw, what happend to Xabre II?? but I really don't think they could jump a step this high from Xabre...
"how was it?? bilinear has four samples, right??"
Let's see, we should go by the tried and tested Nappe1 rule. Nappe1 does not think that they will "jump a step this high from Xabre". Hmm, it therefore must succeed. :p




Kidding kidding, don't hurt me. ;)
 
keegdsb said:
Nappe1 said:
- XGI: no one knows exactly what they have on their sleeves... btw, what happend to Xabre II?? but I really don't think they could jump a step this high from Xabre...
"how was it?? bilinear has four samples, right??"
Let's see, we should go by the tried and tested Nappe1 rule. Nappe1 does not think that they will "jump a step this high from Xabre". Hmm, it therefore must succeed. :p




Kidding kidding, don't hurt me. ;)

based on fact that I am cursed and can't be right, my previous post shows that none of them has really good chances which means in my case that they will success. (so does this mean that PWR has smallest possibilities from that group??) :)
 
Could anyone more experienced in TBDR-related knowledge confirm or deny my sayings?

Why do need a TBDR expert to answer your questions in the first place is beyond me; can I try albeit being a complete layman?

Would you be hardpressed to answer to a question wether today´s high end cards are rather fillrate or bandwidth "hungry", what would you say?

If the answer is going into the bandwidth direction, then why should it be different with upcoming products, since arithmetic and computational efficiency´s importance will increase? Is the real problem of NV30 it´s bandwidth?

Furthermore you´re trying to compare a pure IMR with no advanced bandwidth saving techniques (TNT) vs. an on paper equivalent speced TBDR (K2).

Let´s take q3a as an example for those, which PowerVR claims to have measured an average overdraw in demo001 at 3.39.

TNT2

2*150MHz = 300MPixels/sec
183MHz SDRAM = 2.93GB/sec

KYROII

2*175*3.39 = 1186MPixels/sec
175MHz SDRAM *3.39 = 8.8GB/sec

Now the resulting numbers might seem exaggerated, but you need about 1.1GPixels in that game in raw fillrate to reach 60 fps in 1280*1024*32. How far apart is a K2 from that number?

Nowadays with IMRs using several combinations of advanced bandwidth saving techniques, I´d say that calculations could become tricky. However I´d think that PowerVR - as practically any other IHV - has done enough research as to where future games/applications are heading to and their requirements and have picked the best possible sollution in accordance to their architecture and NOT what everyone else would do or is doing.

But then again, should nVidia be *really* smart and that their FSAA algorithm "revamp" actually means an all new algorithm ( FAA? Who knows ) - then PowerVR would suddently look quite bad indeed. But frankly, I doubt that, and I think Series 5 is unlikely to look really bad.

If you dedicate enough transistors a TBDR can have as fast MSAA as FAA on an IMR and with the same at least amount of samples. I´d speculate that by the time IMRs move to exotic algorithms like that, TBDRs will most likely too, just because it´s cheaper to implement in hardware.

So following this, I see no reason why there couldn't be an 8 pipeline DX9 TBDR, with a similar transistor count as FX 5900--110 million. And this should be paired with similar memory as the FX: roughly synchronous, 256 bit, DDR memory.

Forget pipelines with PS/VS3.0 products and the transistor count up there (albeit I have no idea how it looks like in reality) sounds quite low. Core and memory should run at isochronous speeds and I´m afraid any of your guestimates considering buswidth will be completely off-track ....

***edit: what´s the ratio between fillrate and bandwidth on K2?

*runs for his life* :oops:
 
but I really don't think they could jump a step this high from Xabre...

Well, though some people here really seem to be very confident in the Series 5, I was wondering the same thing.

PowerVR took YEARS to invent the Kyro which didn't even offer hardware T&L. Mind you, the Kyro came out at the time of the Geforce 2.
ATI, NV and all the competitors had time to come out with new architectures since, only PowerVR didn't or couldn't (Series4).

It seems very hard to think they can make such a high jump, from a card which lacked T&L to the first card to support PS and VS 3.0 ...
 
parhelia said:
PowerVR took YEARS to invent the Kyro which didn't even offer hardware T&L.
PowerVR had already produced a T&L solution (Elan) for the arcade, but for the PC at that time there really didn't seem to be that much point. The performance of the other T&L solutions was hardly stellar (more a tickbox feature), the PC CPUs were providing improved FPU performance, and, most importantly, games were nearly always fillrate limited.
 
There's always the missing link of scratched Series4, albeit the jump from that to Series5 is huge too.


PowerVR took YEARS to invent the Kyro which didn't even offer hardware T&L. Mind you, the Kyro came out at the time of the Geforce 2.
ATI, NV and all the competitors had time to come out with new architectures since, only PowerVR didn't or couldn't (Series4).

It wasn't their responsibility that 4 died an unexpected and quick death.

As far as Xabre goes, it can muster almost equivalent average framerates in 3DCenter's Pyramid2003 demo for UT2003 (~20+fps) as a K2, with a Ti4200 ~37fps and a nonPRO 9500 ~33fps. With UT2003 being a pure dx7 T&L game one has really to wonder what the Xabre actually does there.

http://www.mitrax.de/display-review.php?file=page05.htm
 
Sometimes you get a better result by starting from scratch, particularily when there's large API changes afoot. Things are now a big step for everyone irrespective of what they may or may not have in existence prior to making that step.

I could give an example but would probably only get flamed for pointing out that GF1 through GF4 where all basically the same architecture and when they tried to make a large change they where not as successful as everyone expected, so I won't. Oh looks like I have anyway.

John.
(Bracing himself)
 
Back
Top