PowerVR Serie 5 is a DX9 chip?

Joe DeFuria said:
I want more polys, and I still think that TBR's will have problems with high polycounts.

Cue Dave, Simon, Kristof.... ;)
Why should I say it again? Life's too short, I've just had to write a stupid doc explaining why patent XYZW is completely unrelated to a patent application of mine (but happened to have some of the same keywords), and it's time to go home.
 
Bah, I'll just argue with myself for a post. Should be about just as fun (Edit: Simon, this isn't in response to your post):

Chalnoth: TBR's will have a problem with storage bandwidth. Each triangle takes a significant amount of data to store, meaning that as triangles approach the sub-pixel size, TBR's will start to require more memory bandwidth than a comparable z-buffer, particularly since some triangles will have to be read two or more times (spread across different tiles).

Myself: But you can solve these problems through compression.

Chalnoth: Why are triangles inherently more compressible than the z-buffer? We're already compressing the z-buffer, and I'm sure the algorithms will get more sophisticated. Why would triangle compression be any better? Not only that, but the number of triangles per scene will likely be increasing faster than fillrate over the next few years, as it has in the recent past. Compression can only divide the size of the scene buffer by x amount. It can't solve the problem of triangles growing faster than fillrate.

Myself: The z-buffer will still require more bandwidth, and the tiler has the definite advantage of always writing all of its framebuffer data at once, never having to blend with an external framebuffer.

Chalnoth: Until the scene buffer is overrun, where the tiler may still need to do framebuffer reads, and will have to write only partial tiles at once.

Myself: But you can just set the scene buffer size high enough that it's never going to be overrun, and anyway, even when it is overrun it will be more efficient with its memory bandwidth than an IMR.

Chalnoth: But you can't be sure how big that needs to be. Sooner or later, a game will always come along that overruns the scene buffer. And when it is overrun, there will be problems because the user will have selected settings that tax the TBR when the scene buffer is not being overrun, so there will pretty much always be a major stall there. That and the tiler will also have to contend with the scene buffer taking up memory bandwidth and space that the IMR wouldn't need.

....Anyway, I think I wore myself out for the moment...
 
Joe DeFuria said:
Look...let's just cut this short.

1) I mentioned doubling the clock rates as an illustration of doubling fill rate and bandwidth.

2) Whether it's my fault for "confusing you", or your fault for "being confused" is pointless to discuss and was not my intention to raise as an issue. Miscommunications are the fault of both parties.

Usually they are, but I don't think our disagreement is based on miscommunication, unless my interpretation of the conclusion I quoted was misrepresentative.

Now. Start fresh.

OK, but I think some things I said before might still apply.

Everyone wants to see a TBDR with "similar specs" as a high-end IMR. SIMILAR SPECS meaning similar raw fill-rate, and similar raw bandwidth.

Why?

Because generally speaking, we all pretty much expect that cards with similar raw specs generally cost the same.

I think this prior statement of mine applies:
demalion said:
"fillrate doesn't matter directly for cost, it is the amount of transistors (or die area, as Simon would say, I guess) that achieving that fillrate would require. For example, how much design area do these techniques to approach the efficiency of a DR require, and what could be done instead in that space? Ignoring questions like these is a flawed approach to analyzing the cost benefits (and penalties, though I don't know what those are) that might be associated with the differences in a TBDR, which sort of defeats the point of the comparison of "equivalently costly" designs, IMO.

This seems directly pertinent to core cost.

Joe DeFuria said:
500 Mhz, 256 bit DDR-II costs "the same" no matter which chip it's paired up with.

My other statement:

demalion said:
bandwidth doesn't matter as much (though it should matter more than it has in the past) to a TBDR, so that assumption of equivalence of cost does not seem reasonable. That phrasing seems a bit odd...perhaps it would be better to say a TBDR can achieve more with given bandwidth?

This seems directly pertinent to card cost.

Joe DeFuria said:
And the hope is, that a TBDR with the "same specs" (and therefore cost), would significantly outperform the competing IMR.

Well, I don't even think such a comparison (excluding design complexity and implementation as a factor in cost and focusing only on fill rate and bandwidth specifications for cost/performance analysis) works between IMRs.

For cores, it is admitedly less black and white. But I see no reason so suspect anything other than as a best rough estimate, a TBDR core that puts out 800 MPix/sec (raw), "costs the same" as a 800MPix/sec IM core.

What is this "raw" fillrate based on, and why is it important again? What about the impact of "effective fillrate" with a feature such as AA turned on? It seems to me the only concern is the specific cost of implementation, and then based on that the performance achieved...concentrating on fillrate as a point of equivalence seem to me to be a wrong turn.

Can we agree on those assumptions? Before this is taken further, we have to agree on that.

I guess we can't...but if you have an answer to my reasoning perhaps you can change my mind?
 
Chalnoth said:
Chalnoth: TBR's will have a problem with storage bandwidth. Each triangle takes a significant amount of data to store, meaning that as triangles approach the sub-pixel size, TBR's will start to require more memory bandwidth than a comparable z-buffer, particularly since some triangles will have to be read two or more times (spread across different tiles).

Don't want to argue as I never do, I just don't get what you wrote, how would a subpixel sized Triangle be read more than once ?
It cannot be in more than a TILE could it ?

Or maybe I missed something.
 
You're always going to have border triangles. The triangles and pixels will never line up perfectly. Additionally, there may be many triangles that are only sub-pixel in size in one dimension. They could be very long (such as on the side of a pillar or pipe). Granted, these are special cases, and the effects of edge triangles will decrease as triangle counts increase, but will still be there.
 
Joe
I do agree with your statements about starting fresh, however...

And the hope is, that a TBDR with the "same specs" (and therefore cost), would significantly outperform the competing IMR.

And I have shown based on the last info we have for the same spec-ed parts, Kyro vrs TNT2, that the TBDR was able to out perform not only the TNT2 but its "bigger brother" the TNT2 Ultra.

Now does this hold true today? I don't have any idea. Nor do I have proof to say that would hold true at the high end today if such parts existed.. Of course you also don't have any proof to say that it wont. So all we have is beliefs and speculation. Unless your Simon or K. then you know and wont tell :)

Just a few other things:
Lol...quite a change from the "Hardwarwe T&L makes no difference" mantra back in the Kyro vs. GeForce era

There is a difference in what we are saying. I assume you realize that Q3 will use default OpenGL calls to help accelerate some of the basic TnL functions and thus a card with TnL will get some FPS gains. Then if your games actually really uses it then gains are bigger. So can you take a card with out TnL, double its FPS scores and say its the same as a card with TnL with the same specs? Not if you want a true comparison of the rendering methods. You either have to figure out what gain is the TnL cards getting from the TnL unit and add or subtract them to the scores. Or slap the same TnL unit in the K2 This discussion was about TBDR vrs IMR not weather or not TnL is useful.

Also the K2 only (unless my memory is bad) have a single TMU? Having two TMUs and we know from the R9000 vrs 8500s that in some cases that extra TMU can help. So you are comparing cards with 2-TMU cards to the k2 with 1 TMU. Again just doubling FPS scores wont work. You have to adjust for the extra TMU which is not easy. Side note if the k2 does have 2-TMUs then my bad....


These are some of the reasons why I objected to your "doubling method". You can not take these parts and double their scores as they differ in too many areas. We know that in real life that wont be accurate as doubling the speed seldom = double the performance. This can be used to both help/hurt your statement but its still a wrong assumption. And I feel that "doubling" was to far away to be a "guideline or guess".

Edit: just wanted to say that there are many things that effect FPS scores like drivers, ect that double will not accont for...
 
Chalnoth, your first post said you didnt want TBR to become dominant because they would have trouble with high polygon count ... if they were dominant the 3D pipeline would change to accomodate them.

There is no reason to just throw triangles at the screen in random order, it is just a tradition that hardware is build to accomodate. This can be done in the other direction too. Statistically rare primitives are not an issue, the option to just transform them multiple times without needing extra storage is always there. This can hardly be counted as a negative without counting all the cases where performance breakdown for a given scene is larger for an IMR than for a tiler as positives ...
 
"fillrate doesn't matter directly for cost,

I disagree. Fillrate is not the only factor of course, but fill rate does relate directly to cost. To get more fill-rate, you either throw more piplines at the problem, or you design for higher frequencies / lower yields.

Ignoring questions like these is a flawed approach to analyzing the cost benefits (and penalties, though I don't know what those are) that might be associated with the differences in a TBDR, which sort of defeats the point of the comparison of "equivalently costly" designs, IMO.

Demalion, I am not "ignoring" questions like those. I am making assumptions. I am assuming that all things are bout equal. There is no evidence to date, one way or the other, that indicates that given the same raw fill rate target, TBDR or IM are inherently more costly.

Because of LACK of such evidence, I am making the ASSUMPTION that they cost about the same.

Instead of throwing unanswerable questions around, supply some evidence one way or the other.

bandwidth doesn't matter as much (though it should matter more than it has in the past) to a TBDR, so that assumption of equivalence of cost does not seem reasonable.

I don't get it. Is it cheaper to pair 20 GB/Sec worth of raw bandwidth on a card with a TBDR chip, than it is pair it with an IMR chip? No.

I know "bandwidth doesn't matter as much". That's not the point. The point is to build two cards with the same spec, (same cost), and thus the performance theoretically of the TBDR would be much higher.

And for the record, the Kyro-II employed a bandwidth ratio of 8 Bytes per pixel, which is actually just slightly more than the Radeon 9700, and double that of the GeForce FX. So if 8 bytes / pixel is the ideal bandwidth ration for TBDR implementations, as per Kyro-II, then that again supports my assumption as not only reasonable, but likely.

Well, I don't even think such a comparison (excluding design complexity and implementation as a factor in cost and focusing only on fill rate and bandwidth specifications for cost/performance analysis) works between IMRs.

Care to show evidence where it doesn't?

What is this "raw" fillrate based on, and why is it important again?

Raw fill rate is the number of pixel pipes times the clock rate. (Also, the number of TMUs per pipe is to be considered.) It's important because it is a gross measure of how much actual pixel writing power the chip has.

What about the impact of "effective fillrate" with a feature such as AA turned on?

The impact of EFFECTDIVE fill rate will be shown in the performance results! You really do not understand at all my position here.

The goal is to build similary spec'd parts (cost). And then compare the resultant performance.

concentrating on fillrate as a point of equivalence seem to me to be a wrong turn.

Not at all. We're talking equivalence in COST, not in performance.
 
MfA said:
Chalnoth, your first post said you didnt want TBR to become dominant because they would have trouble with high polygon count ... if they were dominant the 3D pipeline would change to accomodate them.

But, as I said, I want to see higher polycounts. I still don't think that significantly more fillrate will really help anywhere close to as much as significantly increasing polycounts.

the option to just transform them multiple times without needing extra storage is always there. This can hardly be counted as a negative without counting all the cases where performance breakdown for a given scene is larger for an IMR than for a tiler as positives ...

Yes, but transforming the primitives multiple times is hardly a good option, particularly if vertex program lengths begin to get long.
 
And I have shown based on the last info we have for the same spec-ed parts, Kyro vrs TNT2, that the TBDR was able to out perform not only the TNT2 but its "bigger brother" the TNT2 Ultra.

You cannot ignore the factor of time. If IMRs like the Radeon 9700 were of the same efficiencey as the TNT-2, then I could see your case.

We're back to sqare one.

Take card that is exactly double the raw specs of the Kyro-II, and put it up against a GeForce4 MX which is of similar complexity. I see comparable performance.

Now does this hold true today? I don't have any idea. Nor do I have proof to say that would hold true at the high end today if such parts existed.. Of course you also don't have any proof to say that it wont.

Right.

Then if your games actually really uses it then gains are bigger. So can you take a card with out TnL, double its FPS scores and say its the same as a card with TnL with the same specs?

Dunno. You realize, of course, that if the TnL unit is significantly impacting the score of the GeForce4, that means that the Kyro is not particularly fill-rate /bandwidth limited. (Kryo being held back by CPU limitations.) Which means that doubling the specs of the Kyro, wouldn't NEARLY double the FPS.

Also the K2 only (unless my memory is bad) have a single TMU?

Right. My "doubled" Kyro would have more pixel rate than the GeForce4 MX, but less Texel rate. So I considered it about equal. (More pixel rate is beneficial in some cases, more texel rate in others.)

These are some of the reasons why I objected to your "doubling method". You can not take these parts and double their scores as they differ in too many areas.

Geeze...this is just a rough exercise people....Like I said, the fact that I doubled the FPS scores of the Kyro was a very FAVORABLE position for the Kyro. I would've thought that would have made up for any "unfairness" that might be seen for the Geforce, because it has a slightly higher bandwidth and higher texel rate.

Kyro can also do 8 textures in one pass. Teh GeForce 4 only 2...and it doesn't have the "loop back" that todays card have.

Edit: just wanted to say that there are many things that effect FPS scores like drivers, ect that double will not accont for...

The point is taken, though it has always been understood.

AGAIN, my point is there are just as many "flaws" with comparing a Kyro to a TNT, as there are with comparing a "doubled" Kyro II with a GeForce4 MX.

Both are estimates where some wild speculation must be made if you try to apply them to today's graphics landscape.
 
Chalnoth said:
But, as I said, I want to see higher polycounts. I still don't think that significantly more fillrate will really help anywhere close to as much as significantly increasing polycounts.

If tiling became dominant the pipeline would adapt before display lists became a problem, and you'd have both the bandwith for higher fillrate and no problem with increasing polygon counts ... so where is the problem?

Yes, but transforming the primitives multiple times is hardly a good option, particularly if vertex program lengths begin to get long.

As long as they are "Statistically rare" I dont care.

If a certain scene has a lot of them it represents a slowdown, but as I said ... if we are going to count such edge cases as negatives we have to count the cases where IMR has a greater breakdown in performance as a tiler as positives.

Marco
 
Joe DeFuria said:
why multiply the SDR memory speed by a factor of 2?
its not DDR???

Um....

Because I DOUBLED the FPS scores of the Kyro-II benchmarks. In order to do that I DOUBLED the pixel fill rate, and DOUBLED the bandwith.

In short:

Original KYRO-II: 175 clock, / 175 Mhz 128 bit SDR
"Doubled" KYRO-II, producing FPS comparable to the GeForce4 MX: 350 Mhz Clock, 350 Mhz, 128 bit SDR.

GeForce4 MX is running 200 Mhz DDR = "Effecitve" 400 Mhz 128 bit SDR.

OK?

Sorry if I missed something, but Joe your reasoning is quite flawed !!

Kyro II @ 350 MHz = 700 MPixel / 700 MTexel

GF4MX @ 350MHz = 700MPixel / 1400MTexel

So the GF4MX still has twice the texel-fillrate of an 350MHz KyroII. All the games in the test use multitexturing, so the KyroII is treated unfair in this "match"
 
Kyro II @ 350 MHz = 700 MPixel / 700 MTexel

GF4MX @ 350MHz = 700MPixel / 1400MTexel

The OVERCLOCKED GF4 MX.

The normal GeForce4 MX is 540/1080. The pixel rate of the "doubled" Kyro is more than that of GeForce4, the Texel rate is lower as I satated in one of my posts above.

In addition, the Kyro has the ability to lay down 8 textures per pass. IMRs only recently acquired this trait through loop-back, and is something the GeForce4 MX lacks.

And again, I gave a straigt 100% FPS increase to the Kyro, which we would undoubtedly not really see from a strict 100% increase in the specs.

So yes, I think it's pretty fair.
 
Joe DeFuria said:
"fillrate doesn't matter directly for cost,

I disagree. Fillrate is not the only factor of course, but fill rate does relate directly to cost. To get more fill-rate, you either throw more piplines at the problem, or you design for higher frequencies / lower yields.

But you are not basing equivalency on higher frequencies and more pipelines, you are basing it on fill rate. You go further and stipulate raw bandwidth equivalency for determing equivalent cost, when both approaches have very different demands on bandwidth to achieve performance. Your basic assumption here progresses into replacing any concern regarding actual cost with fillrate, and ignoring evidence that doesn't fit with this replacement.

Ignoring questions like these is a flawed approach to analyzing the cost benefits (and penalties, though I don't know what those are) that might be associated with the differences in a TBDR, which sort of defeats the point of the comparison of "equivalently costly" designs, IMO.

Demalion, I am not "ignoring" questions like those. I am making assumptions.

Your assumptions are ignoring those questions...let me provide the complete quote with those questions included since they are omitted: "fillrate doesn't matter directly for cost, it is the amount of transistors (or die area, as Simon would say, I guess) that achieving that fillrate would require. For example, how much design area do these techniques to approach the efficiency of a DR require, and what could be done instead in that space?"

If you aren't ignoring that, please clarify where you address it?

I am assuming that all things are bout equal. There is no evidence to date, one way or the other, that indicates that given the same raw fill rate target, TBDR or IM are inherently more costly.

There isn't? This seems to contradict what observations we do have, with a Kyro II with a raw fillrate far below that of a GF 2 MX performing on par with that card. Or perhaps we should compare it to GF 2 GTS and its "raw" fillrate? How much does each core cost?
I am quite aware you stipulate that IMR have increased efficiency, but I direct you to read the questions I say your comments ignore once more...

Because of LACK of such evidence, I am making the ASSUMPTION that they cost about the same.

Instead of throwing unanswerable questions around, supply some evidence one way or the other.

:oops: You are ignoring evidence already provided.

bandwidth doesn't matter as much (though it should matter more than it has in the past) to a TBDR, so that assumption of equivalence of cost does not seem reasonable.

I don't get it. Is it cheaper to pair 20 GB/Sec worth of raw bandwidth on a card with a TBDR chip, than it is pair it with an IMR chip? No.

"so that assumption of equivalence of cost does not seem reasonable."" Hmm...let's speculate about a IMR based card with 20 GB/sec raw bandwidth. Let's say it cost just as much to make a TBDR with 16 GB/sec raw bandwidth. Why would it cost just as much with less bandwidth? Hmm...I don't know, let's say the TBDR core was a bit more complex or higher clocked. How does your assumptions about cost equivalence account for that? How would each part perform? Hmm...wait a second, the TBDR might achieve a higher fill rate at the same cost to the manufacturer...a possibility that contriving that equal fill rate and bandwidth is exactly the same as equal cost doesn't allow for. This is even ignoring the difference in effective use of fillrate between the two designs.

I say to you that proposing equivalent bandwidth doesn't make sense for comparison, nor equivalent raw fillrate. Yes, proposing equivalent cost does. The problem here is you continue to maintain they are equivalent, and I disagree, and I still don't see your basis for it. I don't know what data you are using to maintain that they are, so perhaps that is why I don't see how it is more pertinent than Kyro II transistor count and specs compared to GF2 transistor count and specs in light of the performance achieved by each. Read further for more clarification on this point.

I know "bandwidth doesn't matter as much". That's not the point. The point is to build two cards with the same spec, (same cost), and thus the performance theoretically of the TBDR would be much higher.

We are running into a wall again. Same spec (raw fillrate and bandwidth) does not equal same cost...design complexity and implementation requirements equals same cost. The fundamental presumption of equivalence between those two statements is the flaw I see with your comparison.
Even if we stipulated that those things did equate to same cost, the provided evidence (Kyro II compared to GF 2 GTS for example) that lower spec = higher or equal performance. But, you discount this evidence based on the progress of IMR (using more design complexity) while implicitly excluding the consideration that a higher spec might be achieved with the same actual cost for theTBDR.
You justify this by saying we can't tell what design complexity would offer for a TBDR, so ignoring the possibility of any such benefit for a TBDR while also ignoring actual cost of the design complexity for increased efficiency in IMRs (which it so happens is not reflected in raw bandwidth and fillrate specs) is perfectly justified (because only raw bandwidth and fillrate specs matter for cost...). I simply don't think that makes sense.

And for the record, the Kyro-II employed a bandwidth ratio of 8 Bytes per pixel, which is actually just slightly more than the Radeon 9700, and double that of the GeForce FX. So if 8 bytes / pixel is the ideal bandwidth ration for TBDR implementations, as per Kyro-II, then that again supports my assumption as not only reasonable, but likely.

8 bytes per pixel at 32 bit color output? For the R300 and GFFX, 4 bytes write for color, 4 bytes write for Z buffer, excluding overdraw? First, how do you exclude overdraw 100% for the IMRs? Second, how does the GF FX get by with 4 Bytes per pixel bandwidth utilization? Also, what about Z Buffer checks?

Hmm...I guess I just don't understand that 8 bytes per pixel figure at all. Perhaps explaining it will be another approach to supporting your stance. Since it has nothing in common with the justification I see as circular above, it seems a good way to progress our discussion.

Well, I don't even think such a comparison (excluding design complexity and implementation as a factor in cost and focusing only on fill rate and bandwidth specifications for cost/performance analysis) works between IMRs.

Care to show evidence where it doesn't?

Two questions, which you may consider evidence, or may not:

Do you think GF FX and R 9700 cards cost the same to make at equivalent raw bandwidth and fillrate specifications? That deals with my definition of cost, which I maintain is what matters when determining the desirability of a TBDR.

Do you think they perform the same at equivalent raw bandwidth and fillrate specifications? That deals with what you call cost and my belief that this example illustrates that even between IMRs such a stipulation of cost equivalency does not achieve a helpful comparison. Note both the transistor count/design complexity and the process are ignored by this evaluation.

What is this "raw" fillrate based on, and why is it important again?

Raw fill rate is the number of pixel pipes times the clock rate. (Also, the number of TMUs per pipe is to be considered.) It's important because it is a gross measure of how much actual pixel writing power the chip has.

OK, that is what I thought....you describe pixel and then texel fill rate. But as to why it is important...

What about the impact of "effective fillrate" with a feature such as AA turned on?

The impact of EFFECTDIVE fill rate will be shown in the performance results! You really do not understand at all my position here.

:?: What I don't understand is why your position makes sense for this comparison.

The goal is to build similary spec'd parts (cost). And then compare the resultant performance.

You frame this question based on the assumption that we have no evidence that a TBDR with similar raw performance figures would outperform a IMR with similar raw performance figures, and I can't even agree with you that far, as I stated. You then go forward and give the IMR free performance enhancement not included in your criteria for cost, and discount any possibility of performance enhancement beyond Kyro II design (even Hardware T&L, yes your conclusions agout the Kyro II in comparison to the GF 4 MX exclude that) in your criteria for equivalency.

concentrating on fillrate as a point of equivalence seem to me to be a wrong turn.

Not at all. We're talking equivalence in COST, not in performance.

Your definition of cost equivalence doesn't make sense to me on any level in this comparison.
 
Couple things: 1. Kyro2 loses a lot of fill rate when looping.

2. Doubling fill rate AND doubling bandwidth? Do you really think that'll only yield 2x performance? I'd say closer to 3x, maybe more... why has nobody else thought of this?!
 
2. Doubling fill rate AND doubling bandwidth? Do you really think that'll only yield 2x performance?

Um, yes and that's a theoretical maximum.

I'd say closer to 3x, maybe more... why has nobody else thought of this?!

Because it doesn't make sense, that's why. ;) We're talking about bottlenecks. You can't "add" fill-rate to bandwidth.
 
But you are not basing equivalency on higher frequencies and more pipelines, you are basing it on fill rate. You go further and stipulate raw bandwidth equivalency for determing equivalent cost, when both approaches have very different demands on bandwidth to achieve performance.

Sigh...I'll just stop reading there. If you don't understand me by now, it's not going to happen.

This is not an insult to you...we're obviously just on some very different wavelength.

Please consider what I said in an earlier post. The Kyro-II's raw "bandwidth per fill rate" is actually as high or higher than today's single TMU chips. It's also comparable with the GeForce MX. (It has higher bandwidth per pixel, but lower bandwidth per texel.)

In other words, you claim that they have different demands, but in practice, they are shipped with similar ratios. If that weren't the case, Kryo-II would have shipped with either 64 bit memory, or much slower memory.
 
Ah, what the heck, one more thing...

You frame this question based on the assumption that we have no evidence that a TBDR with similar raw performance figures would outperform a IMR with similar raw performance figures, and I can't even agree with you that far, as I stated.

No, where did I assume any such thing.

I frame the question based on the assumption that he have no evidence that a TBDR with similar raw specificatoins would cost less than an IMR with similar raw specifications.

You continually either read too deeply into my statements, or don't read them closely enough.
 
Back
Top