PowerVR Serie 5 is a DX9 chip?

Uttar,

I, personally, was speculating about possible reasons why other companies haven't come out with TBDR, if they have benefits over standard IMRs - in response to Joe's thoughts. As far as I am aware, only Videologic and Gigapixel have ever released hardware (or shown sample hardware in the case of GP) which has used TBDR.

I know that PowerVR uses infinite planes for calculations in their hardware and I believe that Gigapixel used a different system. I assume, of course, that GP were aware of what was happening with PowerVR and would therefore have made sure they didn't infringe on any patents, but bear in mind that PowerVR has been around for some time now. It was originally billed as a competitor with Voodoo 1!

It's all speculation until somebody releases a high-end TBDR!
 
Teasy said:
I just have a hard time seeing how anyone could see a TNT2 (which AFAICS had an identical basic spec to Kyro II) or even a Geforce 2 MX against a Kyro II a couple of years ago and not think "wow a TBR really does kill a similarly speced IMR".

Just wanted to add a little more precision to that. Those are the numbers I've found with google, not sure if they're accurate.
TNT2: 10.5 Million Transistors, 125-150Mhz core, 140-200Mhz memory
Kyro: 12 Million Transistors, 125Mhz core / memory clock
Kyro II: 15 Million Transistors, 166Mhz core, 175Mhz memory
GeForce 2 MX: 19 Million Transistors, 175Mhz core, 166Mhz memory
GeForce 2 GTS: 25 Million Transistors, 200Mhz core, 333Mhz effective memory

So, a Kyro II beating a TNT2 would not be impressive at all. It got 40% more transistors and a faster clock.
A Kyro beating a TNT2, on the other hand, would be good. And it most certainly does.
http://www.anandtech.com/video/showdoc.html?i=1253&p=11
At 32BPP, it even beats a GeForce DDR!

The Kyro II is able to do much more than simply beating the GF2 MX, it seems:
http://www.anandtech.com/video/showdoc.html?i=1435&p=14
http://www.anandtech.com/video/showdoc.html?i=1435&p=12
http://www.anandtech.com/video/showdoc.html?i=1435&p=10

In many cases, it's able to beat the GF2 GTS ( but most certainly not in all conditions )
And that's VERY impressive, considering how the GF2 GTS got nearly two times more transistors and much faster memory/core

Is the Kyro impressive? Yes. TBDR obviously costs more transistors than an IMR without any memory saving techniques, about 25% more transistors. However, things like LMA probably cost as much.
But without concrete information, it's really hard to say - so only time will tell, right?


Uttar
 
So your saying that you don't believe that, say, a Radeon 9700 would be killed by a similarly speced TBR? Well that's certainly debatable with the improvements in effeciency since Geforce 2. So I wouldn't argue with that as its not proven either way.

That was exactly the reasoning. We're past and beyond the SDRAM era a long time ago, and I have severe doubts that today a high end TBDR would get away with significantly less transistors than an IMR with all the shader wizzbang to start with, thus cost might be similar too.

Since you brought up the 9700; take a hypothetical TBDR with the exact same specs and algorithms. Why do I have the feeling that the latter would shine mostly in today's applications only in cases where the R300 would be bandwidth strangled? A good example I figure would be over 4x sparsely sampled MSAA, especially in high resolutions.

I have severe doubts though that any card or architecture can at this stage squeeze out better 16x sample Anisotropic performance than a R300. I'd love to be pleasantly surprised though.

Does that make more sense? (it's a hypothetical scenario anyway and based on pure speculation).
 
Teasy said:

I just have a hard time seeing how anyone could see a TNT2 (which AFAICS had an identical basic spec to Kyro II) or even a Geforce 2 MX against a Kyro II a couple of years ago and not think "wow a TBR really does kill a similarly speced IMR".

Obviously, IMRs have make very significant gains in efficiency since the TNT-2 days, so that's not a real fair comparison.

Quite a timely article:

http://www.xbitlabs.com/video/6-value-roundup/index2.html#6

Of course, the Kyro-II is a bit old in comparison to the rest of the chips, and rightly gets beat soundly. However, take the Kyro-II, and let's say, theoretically double it's clock-rate specs. This would make it, spec wise, comparable to the GeForce4 MX. And let's assume (being very generous here), that doubling it's specs leads exactly to a 2X increase in FPS score at 1024x768.

Doing that, we find that the "doubled" scores of the Kyro-II are comparable to the GeForce4 MX scores.

If the TBDR architecture was so vastly "inherently" superior, one would expect a "doubled" Kyro-II to be a significant step above the GeForce4 MX.

Now, you could argue that like IMRs, TBDRs "since the KyroII" are also making similar gains in efficiency. Possibly, but that's obviously nothing but pure speculation though, since we have no TBDRs since the Kyro-II days....
 
Hmm...Joe, I simply don't think multiplying clock speed makes a valid comparison. The assumption that with increased transistor count the performance of a Tiled architecture would only scale due to clock speed increase is central to your assumption, and I don't think that belief is defensible at all.

To start with, the addition of hardware T&L seems likely to be the very least that would have been added. And I think this might have a significant impact on those particular benchmarks.

Going from there, there is the question of the number of pixel pipelines and/or TMUs per pipe, that would be allowed by the design, and the impact that would have.

Then, there is the question of whether transistor count and design of such a chip could facilitate higher clock speeds for the same cost in comparison to your GF 4 MX example.

All of these factors seem to be omitted in your consideration.

What could be implemented on a "Kyro" type design with ~30 million transistors...wasn't the Kyro II at about 15 million transistors?
 
Hmm...Joe, I simply don't think multiplying clock speed makes a valid comparison. The assumption that with increased transistor count the performance of a Tiled architecture would only scale due to clock speed increase is central to your assumption, and I don't think that belief is defensible at all.

Heh...it's about as defensible as comparing a Kyro-II with a TNT-2 Ultra. That's pretty much my point. :LOL:

All of these factors seem to be omitted in your consideration.

Right...because we have no idea but pure speculation on those other factors.
 
Joe DeFuria said:
...
Doing that, we find that the "doubled" scores of the Kyro-II are comparable to the GeForce4 MX scores.

If the TBDR architecture was so vastly "inherently" superior, one would expect a "doubled" Kyro-II to be a significant step above the GeForce4 MX.
...

I was answering this expectation. I'll also point out that, taking Uttar's post as an example, the performance of parts with transistor counts of 12 million compared to 10.5 and then 15 million compared to 19 million does to me seem to indicate that the superiority of performance/production cost ratio for TBDR, even ignoring reduced costs in card manufacturing assuming lower bandwidth demand. While it is speculation to assume that would still be the case, it seems to me more reasonable speculation than to assume that it would not given the argument you presented.

...
Now, you could argue that like IMRs, TBDRs "since the KyroII" are also making similar gains in efficiency. Possibly, but that's obviously nothing but pure speculation though, since we have no TBDRs since the Kyro-II days....

And I'm saying that your speculation consists of evaluating such benefits as offering nothing, which is not (IMO) a reasonable assumption.
 
Just realized I did a slight mistake about GF2 MX - Kyro 2

The Kyro 2 doesn't support T&L, and the GF2 MX does. So saying the Kyro 2 takes less transistors for its pipelines than the GF2 MX is practically unverifiable, unless nVidia accepts to give us some numbers.
Which they probably won't do, considering how old those numbers would be and how small their interest of giving them would be right now.


Uttar
 
Actually, # of transistors is not all that important. The area of the chip is more significant <shrug>.
 
I'll also point out that, taking Uttar's post as an example, the performance of parts with transistor counts of 12 million compared to 10.5 and then 15 million compared to 19 million does to me seem to indicate that the superiority of performance/production cost ratio for TBDR,

No, it doesn't. Again, we're just purely speculating. In my "scenario" I "magically" increased the clock of the KyroII by 100%. That is simply an easy way to "illustrate" a 2X spec increase. What would that do to the "production cost"? Who knows.

And I'm saying that your speculation consists of evaluating such benefits as offering nothing, which is not (IMO) a reasonable assumption.


What benefits are you talking about? My speculation didn't consists of evaluating cost benfits or cost drawbacks with increasing the "spec" of the Kyro two-fold relative to the IMRs. We just have no idea.

To be clear, I am pretty much assuming that "cards with similar raw specs, will have similar costs." If the raw fill rate and raw bandwidth numbers are similar, it is reasonable to assume that the costs should be similar.
 
Joe,

back to this whole thing about "matching up a simular spec-ed TBR vrs IMR". Consider the frist vrs of the kryo:

http://www.anandtech.com/video/showdoc.html?i=1253&p=2

A quick glance at those specs may ellicit a "Are you kidding me?" initial response from many of you. Afterall, 125 MHz, 2 pixel pipelines, 250 megapixel/second fillrate, and a 0.25 micron manufacturing process sounds a lot like an NVIDIA RIVA TNT2 (not ultra) which has been out for over a year.

Now if if you look at the specs for both the kryo and the tnt2 (non ultra) :

Kyro
125 MHz core / memory clock
250 megapixel/s fillrate

TNT2 (non Ultra)
125MHz – 150+MHz 128-bit 2D/3D core
300 Megapixels per second


You can see they are very close with the TNT2 "slighlty" faster on paper. Probably as close as we can fine. But if you look in the above link you can see that the Kyro beats the TNT Ultra model in almost every test. Some by a large margine. Some smaller. Its old but it shows that spec for spec, TBD do have some advatages. sorry if this was posted before (did not go through all 9 pages of this thread. my bad if so).

Now, you could argue that like IMRs, TBDRs "since the KyroII" are also making similar gains in efficiency. Possibly, but that's obviously nothing but pure speculation though, since we have no TBDRs since the Kyro-II days....

Is it pure speculation or common sense that they are going to try to improve their efficency? Remember we do have people here that work with TBR products. Granted we have no real proof. But we also have no reall proof that IRMs will ever try due the same from where its at today. Which common sense tells us both that is wrong as IRMs will try improve.


Doing that, we find that the "doubled" scores of the Kyro-II are comparable to the GeForce4 MX scores.

Many things wrong with that. Your comparing non-tnl scores to a part with tnl. TnL does play a small part in some of those scores. Also dont forget that the K2 was not using DDR as was the GF4 mx in that link you provided so you can just double the specs and say its the same as its not. To be fair you would first have to find scores on the k2 using DDR and I am sure only a few of use here would have a clue on what that might be :)
 
Come on Joe. Is it pure speculation or common sense that they are going to try to improve their efficency?

It is common sense that they will improve it. It is not common sense to assume they will improve it the same amount relative to IMRs.

Much of the efficiencies that IMRs have been gaining, is due to applying techniques (like occlusion culling and heirarchical z) that are already inherent in the TBDR chips at maximum efficiency. (In other words, a perfectly efficient heirarchical Z and occlusion culling system would yield similar benefits to the "standard" deferred renderer.)

So clearly, efficiencies gained in TBDRs must come from other areas...not reducing overdraw. Not saying it's not possible at all. Just that it's not common sense to assume they will gain in efficiency in the same proportions as IMRs have.

Many things wrong with that. Your comparing non-tnl scores to a part with tnl. TnL does play a small part in some of those scores....

Lol...quite a change from the "Hardwarwe T&L makes no difference" mantra back in the Kyro vs. GeForce era. ;)

dont forget that the K2 was not using DDR as was the GF4 mx in that link...

I didn't forget it. I was assuming SDR clocked at 175x2 = 350 Mhx Effective is comparable to the GeForceMx's 200 Mhx DDR = 400 Mhz effective.

Clock for clock, SDR will be more efficient than DDR.

And again, in my "asumptions" I am assuming 100% FPS increase at 1024x768 for KyroII. Meaning, that at that setting, every test is 100% fill rate or bandwidth limited 100% of the time.

How realistic do you think that is, and is that "fair" to the MX?

To be fair you would first have to find scores on the k2 using DDR...

No, to be fair I should find or approximate scores on the K2 with similar fillrate and bandwidth of a recent IMR implementation, which is what I did.

Again, the bottom line is that we have NO "Recent" implementation of TBDRs in existence. I believe my comparison is as "fair" as any other given this complete lack of data to draw from.
 
Joe DeFuria said:
I'll also point out that, taking Uttar's post as an example, the performance of parts with transistor counts of 12 million compared to 10.5 and then 15 million compared to 19 million does to me seem to indicate that the superiority of performance/production cost ratio for TBDR,

No, it doesn't. Again, we're just purely speculating. In my "scenario" I "magically" increased the clock of the KyroII by 100%. That is simply an easy way to "illustrate" a 2X spec increase. What would that do to the "production cost"? Who knows.

:?:

The GF 4 MX is ~ 30 million transistors (sorry Simon, I have no solid info on the area taken up by the designs :p ...if you have such, do share).

Based on doubling the clock speed of the Kyro II, and keeping the transistor count at 15 million, your comment is:

Joe DeFuria said:
If the TBDR architecture was so vastly "inherently" superior, one would expect a "doubled" Kyro-II to be a significant step above the GeForce4 MX.

This is in relation to the argument that a "similarly speced" TBDR would outperform a IMR...you are not even remotely at "similarly" speced except in regards to clock speed. Simply stating that it is better to dismiss the possibility of the improvements that could be achieved by design is not making less in the way of assumptions, it is just making different assumptions.

And I'm saying that your speculation consists of evaluating such benefits as offering nothing, which is not (IMO) a reasonable assumption.

What benefits are you talking about? My speculation didn't consists of evaluating cost benfits or cost drawbacks with increasing the "spec" of the Kyro two-fold relative to the IMRs. We just have no idea.

Well, you aren't saying "we can't tell for sure", you are saying "using this info, an expectation of a TBDR outperforming an equivalently costly IMR seems to be unfounded" (see the prior quote). I'm disagreeing with the latter.

To be clear, I am pretty much assuming that "cards with similar raw specs, will have similar costs." If the raw fill rate and raw bandwidth numbers are similar, it is reasonable to assume that the costs should be similar.

1) bandwidth doesn't matter as much (though it should matter more than it has in the past) to a TBDR, so that assumption of equivalence of cost does not seem reasonable. That phrasing seems a bit odd...perhaps it would be better to say a TBDR can achieve more with given bandwidth?

2) fillrate doesn't matter directly for cost, it is the amount of transistors (or die area, as Simon would say, I guess) that achieving that fillrate would require. For example, how much design area do these techniques to approach the efficiency of a DR require, and what could be done instead in that space? Ignoring questions like these is a flawed approach to analyzing the cost benefits (and penalties, though I don't know what those are) that might be associated with the differences in a TBDR, which sort of defeats the point of the comparison of "equivalently costly" designs, IMO.
 
why multiply the SDR memory speed by a factor of 2?
its not DDR???

Um....

Because I DOUBLED the FPS scores of the Kyro-II benchmarks. In order to do that I DOUBLED the pixel fill rate, and DOUBLED the bandwith.

In short:

Original KYRO-II: 175 clock, / 175 Mhz 128 bit SDR
"Doubled" KYRO-II, producing FPS comparable to the GeForce4 MX: 350 Mhz Clock, 350 Mhz, 128 bit SDR.

GeForce4 MX is running 200 Mhz DDR = "Effecitve" 400 Mhz 128 bit SDR.

OK?
 
Based on doubling the clock speed of the Kyro II, and keeping the transistor count at 15 million,

Where did I say I was keeping the transistor count at 15 Million?! You are missing my line of argument entirely. See my sig....

I should have never said "doubling clock rate" to begin with. It's just confusing you.

Let me put it another way.

I'm DOUBLING THE RAW FILL RATE, and I'm DOUBLING THE RAW BANDWDITH. Full stop. No inherent means implied to actually achieve that specification.

There are a number of ways you can do either, and each way has its own pros and cons with respect to cost.
 
Joe DeFuria said:
Based on doubling the clock speed of the Kyro II, and keeping the transistor count at 15 million,

Where did I say I was keeping the transistor count at 15 Million?! You are missing my line of argument entirely. See my sig....

Eh? Doubling the clock speed doubles the fill rate...

You did mention the Kyro II, and you did mention doubling the clock, fillrate, and bandwidth, and then you did mention a basis for comparison to the GF 4 MX based on this. I do think I understand what you said, if you meant something different, I don't think it is my fault it didn't get across. :-?

I should have never said "doubling clock rate" to begin with. It's just confusing you.

Hmm...no, I don't think so. But perhaps after this post you can point out where I went wrong if I did do so.

Let me put it another way.

I'm DOUBLING THE RAW FILL RATE, and I'm DOUBLING THE RAW BANDWDITH. Full stop. No inherent means implied to actually achieve that specification.

Hmm, ok let's revisit the text I'm responding to. 1) You specifically mentioned clock rates (hence my comments relating to it) 2) You specifically, in conjunction, mention double the fillrate and bandwidth (which is directly achieved if you happen to double the clock rates). Where is my confusion?

Here again is the specific conclusion, based on this comparison, I am addressing:

Joe DeFuria said:
If the TBDR architecture was so vastly "inherently" superior, one would expect a "doubled" Kyro-II to be a significant step above the GeForce4 MX.

The GF 4 MX runs at a higher clock speed, and has a more complex design. Both relate to its performance. I don't see both being accounted for in your comparison.

...

Now, addressing your new statement, clarify how you are determining equivalence of both of these factors in relation to cost? Your prior statements do not help here (AFAICS) so another explanation seems necessary. What I see in your prior text is you discounting any benefits associated with increased complexity of design for a TBDR, and claiming that does not require assumptions (in contrast to the expectation of increased performance for a TBDR directly associated with increased complexity) and I think that it does.

There are a number of ways you can do either, and each way has its own pros and cons with respect to cost.

Yes, but I'm specifically addressing your assertion that your specific method of comparison, and the associated conclusion on the benefits a TBDR would offer today, is valid.
 
Look...let's just cut this short.

1) I mentioned doubling the clock rates as an illustration of doubling fill rate and bandwidth.

2) Whether it's my fault for "confusing you", or your fault for "being confused" is pointless to discuss and was not my intention to raise as an issue. Miscommunications are the fault of both parties.

Now. Start fresh.

Everyone wants to see a TBDR with "similar specs" as a high-end IMR. SIMILAR SPECS meaning similar raw fill-rate, and similar raw bandwidth.

Why?

Because generally speaking, we all pretty much expect that cards with similar raw specs generally cost the same. 500 Mhz, 256 bit DDR-II costs "the same" no matter which chip it's paired up with. And the hope is, that a TBDR with the "same specs" (and therefore cost), would significantly outperform the competing IMR.

For cores, it is admitedly less black and white. But I see no reason so suspect anything other than as a best rough estimate, a TBDR core that puts out 800 MPix/sec (raw), "costs the same" as a 800MPix/sec IM core.

Can we agree on those assumptions? Before this is taken further, we have to agree on that.
 
jb said:
You can see they are very close with the TNT2 "slighlty" faster on paper. Probably as close as we can fine. But if you look in the above link you can see that the Kyro beats the TNT Ultra model in almost every test. Some by a large margine. Some smaller. Its old but it shows that spec for spec, TBD do have some advatages. sorry if this was posted before (did not go through all 9 pages of this thread. my bad if so).

Not necessarily. These are still dealing with low polycounts, and do not take into account the effect of a T&L engine. I've said in the past, back around the time of the Kyro II, that if a good high-end tiler were released then, it would have been much better than the IMR's of the day. My problem has always been for the future of 3D games. I want more polys, and I still think that TBR's will have problems with high polycounts.
 
Back
Top