PowerVR 5?

Chalnoth said:
Um, no. There are many factors today that ensure that IMR's are becoming more and more fillrate-limited. Anisotropic filtering, for one, requires a good amount of fillrate, but not as much memory bandwidth.

TBDRs aside, yes AF does require a lot of fill-rate. Lately architectures though result in approximately half the work anyway with angle-dependancy and other optimisations on top.

2 angles get up to 16xAF, the remaining 2 angles though are stuck with 2xAF. Now add the "brilinear", texturing stage, LOD optimisations and it's even less of what I'd call "half the work". It's really no wonder that AF comes nowadays with minimal performance penalties.

Long shaders require lots of fillrate but little memory bandwidth. Framebuffer compression, z-buffer compression, and early depth check methods further improve memory bandwidth efficiency.

While it's true, the efficiency on average is still lower than on a deferred renderer.

Besides a board that has an ungodly amount of fill-rate, yet lacks in bandwidth to feed said fill-rate is far from optimal too. IMHO any architecture needs a fine balance between it's fillrate and it's bandwidth.

Not anymore. Cards like the GeForce 6600 GT show this pretty conclusively.

An excellent mainstream card, what else is there to say? It won't hold a candle against any 6800 once AA/AF comes into play in high resolutions. It's a pure coincidence that the Ultra has more than twice it's bandwidth too.

Memory bandwidth still seems to scale from generation to generation this far. Once the market moves away from GDDR3 for the high end I expect to see bandwidth figures past 40-45GB/sec in no time and that's of course just another coincidence and vendors just slap that much expensive ram on boards because they supposedly sell better too.....

GPUs are still GPUs despite their constant increase in arithmetic efficiency; if they ever reach the point of coming too close to CPUs they truly might become redundant after all.

When it comes to bandwidth and fill-rate (preferably in a healthy balance between them) those can never be "enough" on a GPU, irrelevant if it's a TBDR or an IMR.
 
jvd said:
I don't know how efficent the imrs have become .

Last example we've seen is the geforce 2 vs the kyro
For testing efficiency in high overdraw situations, PowerVR offers a couple of benchmarks ;)

I've picked up these numbers (all at stock clocks):
D3DVillagemark 1.19, 1024x768 32bpp, trilinear
Kyro 2: 132 fps
GeForce4 Ti 4200: 97 fps
GeForce3 Ti 500: 75 fps
Radeon 9700: 176 fps

Fablemark 1.0, 1024x768 32bpp
Geforce 6800: 115,7 fps
Radeon 9700Pro: 70 fps
Kyro 2: 35 fps
 
For testing efficiency in high overdraw situations, PowerVR offers a couple of benchmarks

So it takes a 9700pro to beat a kyro 2 in this test ?

kyro 2 is what 2x2@ 175mhz ? The 9700pro is 8x1 at 320 ?

Seems to me that a defered renderer wit hthe specs of a 9700pro will still walk all over a 9700pro
 
I pretty sure so would a dedicated raytracing hardware with those specs but it doesn't mean it's going to happen. The current IHV's have invested to much money into IMR to just change their form of graphics processor no matter how much beter it maybe. That is why some people are expecting the change to come from a the console platform first if ever because they don't have to worry about taking a huge risk of no one supporting it. The consoles will be supported by their own companies thus not nearly as much of a risk as throwing it out into the PC market and watching no one support itno matter how much better of a solution it is.
 
jvd said:
Seems to me that a defered renderer wit hthe specs of a 9700pro will still walk all over a 9700pro

I don't think ANY of us doubt a TBDR with 9700 clock speeds would kill an ATI 9700. But the issue seems to be getting there in the first place.

In other words, couldn't one make the silly supposition that, if GTS2 had "super-duper" technology that removed all overdraw, then it would slay a Kryo 2.
 
kyro 2 is what 2x2@ 175mhz ? The 9700pro is 8x1 at 320 ?
kyro 1/2 were 2x1 IIRC. that would give them a fillrate of about 350MP/s, and they could actualy hit that in pretty much every fillrate test.

keep in mind that villagemark and fablemark are pretty ideal situations for pvr hardware. they use pretty low geometry (kyro has no t&l), almost no alpha textures, and have rediculouly high overdraw that imr have to cope with. not to say that they aren't an impressive showing for TBDR, just that the cards are stacked a bit in favor of PVR.
 
Ailuros said:
An excellent mainstream card, what else is there to say? It won't hold a candle against any 6800 once AA/AF comes into play in high resolutions. It's a pure coincidence that the Ultra has more than twice it's bandwidth too.
The GeForce 6800 also has four times the ROP's, which, I claim, is much more important in this case. Put simply, due to compression, multisampling AA no longer imposes a significant memory bandwidth penalty. I doubt AF performance penalties differ that much, though I haven't looked at that specifically.

Anyway, the point here is that the 6600 GT has quite a bit less memory bandwidth per pixel rendered when compared to the 6800, but has about the same fillrate. Even with this deficiency, though, the 6600 GT has no problem sustaining high performance levels. Furthermore, I claim that the lack of memory bandwidth of the 6600 GT will mean less and less as games use more long shaders.

Edit: More telling information comes if you directly compare some of the high-resolution, 4x FSAA benchmarks between the 6800 and the 6600 GT. Here is one such example. Notice that as the resolution creeps past 1280x1024, the performance of the 6600 GT drops below 71% that of the 6800. The GT has 71% the memory bandwidth of the 6800, and the two have the same amount of memory, so I claim that memory bandwidth cannot possibly be the sole differentiator in performance between the two. I further claim that it is likely that it is subdominant when compared to the ROP limitation of the GT with 4x AA enabled.

When it comes to bandwidth and fill-rate (preferably in a healthy balance between them) those can never be "enough" on a GPU, irrelevant if it's a TBDR or an IMR.
Of course, but if you double the memory bandwidth of, say, the GeForce 6800 Ultra, you'd not gain any significant performance. Hell, for a number of benchmarks, you'd also not lose a huge amount of performance by cutting in half the amount of memory bandwidth of the 6800 Ultra.
 
Chalnoth said:
The GeForce 6800 also has four times the ROP's, which, I claim, is much more important in this case. Put simply, due to compression, multisampling AA no longer imposes a significant memory bandwidth penalty.

Again the NV43 is a mainstream card and as that it's an excellent offering. I doubt anyone would expect with ~200$ to be able to play with 4xAA in high resolutions.

MSAA is virtually for free with 2x samples, while wherever there aren't any CPU limitations for instance 4xAA does take a bit more of a performance hit on today's accelerators. Saying though that multisampling doesn't impose a significant memory bandwidth penalty is only really true for 2xAA and is a VAST generalization. If I take a look at ATI's 6x sparse or the possibility of higher sample densities on today's cards the bandwidth penalty will be there, just because those cards weren't obviously layed out/designed for more.

I doubt AF performance penalties differ that much, though I haven't looked at that specifically.

I've been begging someone to write a good fillrate tester for quite some time now. There's zeckensack's Archmark for instance, but it's somewhat outdated. Just 2xAF because there's no angle-dependency there (6800/16p):

1,2,3,4 = single, dual, triple, quad texturing

High Quality, noAF:

--Textured fillrate-----------------------------
----Bilinear filter-----------------------------
1 4.831 GPix/s
2 2.813 GPix/s
3 1.876 GPix/s
4 1.407 GPix/s

----Trilinear filter----------------------------
1 2.821 GPix/s
2 1.393 GPix/s
3 940.988 MPix/s
4 705.767 MPix/s

High Quality, 2xAF:

--Textured fillrate-----------------------------
----Bilinear filter-----------------------------
1 2.739 GPix/s
2 1.410 GPix/s
3 940.102 MPix/s
4 705.101 MPix/s

----Trilinear filter----------------------------
1 2.782 GPix/s
2 1.410 GPix/s
3 940.105 MPix/s
4 705.106 MPix/s

Quality, noAF:

--Textured fillrate-----------------------------
----Bilinear filter-----------------------------
1 4.907 GPix/s
2 2.813 GPix/s
3 1.876 GPix/s
4 1.407 GPix/s

----Trilinear filter----------------------------
1 3.321 GPix/s
2 1.661 GPix/s
3 1.103 GPix/s
4 825.448 MPix/s

Quality, 2xAF:

--Textured fillrate-----------------------------
----Bilinear filter-----------------------------
1 4.710 GPix/s
2 2.645 GPix/s
3 1.759 GPix/s
4 1.317 GPix/s

----Trilinear filter----------------------------
1 5.280 GPix/s
2 2.645 GPix/s
3 1.759 GPix/s
4 1.317 GPix/s



Anyway, the point here is that the 6600 GT has quite a bit less memory bandwidth per pixel rendered when compared to the 6800, but has about the same fillrate. Even with this deficiency, though, the 6600 GT has no problem sustaining high performance levels. Furthermore, I claim that the lack of memory bandwidth of the 6600 GT will mean less and less as games use more long shaders.

As long as you won't use anything that stresses it's ROPs and or bandwidth yes. It doesn't take a wizzard to think that a 6800 and a 6600GT will be damn close in performance up to let's say 1280 w/o any AA, it's already today the case.

Speaking of ROPs or Z units and since it's not irrelevant to the thread here, KYRO had 16 Z/stencil units per pipeline.

Edit: More telling information comes if you directly compare some of the high-resolution, 4x FSAA benchmarks between the 6800 and the 6600 GT. Here is one such example. Notice that as the resolution creeps past 1280x1024, the performance of the 6600 GT drops below 71% that of the 6800. The GT has 71% the memory bandwidth of the 6800, and the two have the same amount of memory, so I claim that memory bandwidth cannot possibly be the sole differentiator in performance between the two. I further claim that it is likely that it is subdominant when compared to the ROP limitation of the GT with 4x AA enabled.

Who on God's green earth claimed otherwise anyway? You're the one who keeps preaching for quite some time now that bandwidth loses it's importance; while it's within limits somewhat true for specific cases only, it's neither an absolute and you make it sound like an exaggeration too. Bandwidth is important and it will remain important on future accelerators too. There will come a time in the not so foresseable future where we'll see 512bit buses; yeah what the heck for hm?

Of course, but if you double the memory bandwidth of, say, the GeForce 6800 Ultra, you'd not gain any significant performance.

Don't pretend like you're not understanding what I'm trying to say here or try to twist things in the wrong direction.

The NV40 in general has a pretty good balance between fillrate and bandwidth. Would I double theoretically either/or the differences wouldn't be "huge"; would I give it though exactly twice the current fillrate and twice the current bandwidth than most likely yes.

Hell, for a number of benchmarks, you'd also not lose a huge amount of performance by cutting in half the amount of memory bandwidth of the 6800 Ultra.

That's true but I'm sure you're not trying to tell me that high end accelerators don't have a reason for their existence, do you? An enthusiast knows (or should know) how to make good use of that spare memory bandwidth.
 
see colon said:
kyro 2 is what 2x2@ 175mhz ? The 9700pro is 8x1 at 320 ?
kyro 1/2 were 2x1 IIRC. that would give them a fillrate of about 350MP/s, and they could actualy hit that in pretty much every fillrate test.

keep in mind that villagemark and fablemark are pretty ideal situations for pvr hardware. they use pretty low geometry (kyro has no t&l), almost no alpha textures, and have rediculouly high overdraw that imr have to cope with. not to say that they aren't an impressive showing for TBDR, just that the cards are stacked a bit in favor of PVR.

True. I wouldn't expect otherwise though from any vendor when it comes to "house-own" techdemos and/or benchmarks.

Speaking of the dx9.0 MRT demo is far more interesting and relevant to today's situations than the prementioned. Of course is it another case scenario tailored to show the benefits of a TBDR.

Chances that we'll ever see a techdemo from a vendor that shows the disadvantages of their products is below "0" :LOL:
 
zeckensack said:
jvd said:
I don't know how efficent the imrs have become .

Last example we've seen is the geforce 2 vs the kyro
For testing efficiency in high overdraw situations.....

If you'd pick GL_EXTREME instead, it would be quickly obvious that either with an overdraw factor of 3 or 8, the results are almost identical on a KYRO between front to back, back to front and random order.

While applications get mostly optimized for front to back these days, it's still a factor that's not entirely unimportant.

I know the next best answer will be that applications get also early-Z optimized like Doom3 for instance; that way we get in relative terms to "application driven deferred rendering". Apart from the obvious CPU limitations Doom3 specifically poses, a TBDR should theoretically still benefit from it's higher Z/stencil fill-rates.

I'm fairly disappointed to not read this round about parameter bandwidth consumptions for TBDRs :LOL:

If I take a look though in the PDA/mobile market currently I don't see any significant differences between equally clocked TBDRs and IMRs w/o any additional geometry units at least this far.
 
Chalnoth said:
Probably not. Depends upon the rendering algorithm. Any game which does an initial z pass (a pretty smart thing to do with the longer shaders that many new games today use) would have pretty much the same effective fillrate whether rendered with a deferred renderer or an immediate mode renderer.

But how much does that initial z-pass cost?
 
zeckensack said:
jvd said:
I don't know how efficent the imrs have become .

Last example we've seen is the geforce 2 vs the kyro
For testing efficiency in high overdraw situations, PowerVR offers a couple of benchmarks ;)

I've picked up these numbers (all at stock clocks):
D3DVillagemark 1.19, 1024x768 32bpp, trilinear
Kyro 2: 132 fps
GeForce4 Ti 4200: 97 fps
GeForce3 Ti 500: 75 fps
Radeon 9700: 176 fps

Fablemark 1.0, 1024x768 32bpp
Geforce 6800: 115,7 fps
Radeon 9700Pro: 70 fps
Kyro 2: 35 fps

Fablemark is a stencil buffer benchmark, it uses a buttload of stencil shadows.
 
see colon said:
kyro 2 is what 2x2@ 175mhz ? The 9700pro is 8x1 at 320 ?
kyro 1/2 were 2x1 IIRC. that would give them a fillrate of about 350MP/s, and they could actualy hit that in pretty much every fillrate test.

keep in mind that villagemark and fablemark are pretty ideal situations for pvr hardware. they use pretty low geometry (kyro has no t&l), almost no alpha textures, and have rediculouly high overdraw that imr have to cope with. not to say that they aren't an impressive showing for TBDR, just that the cards are stacked a bit in favor of PVR.

Villagemark used something like 4 textures on all its polygons making it require a high raw fillrate so it wasn't Ideal, but it was high in overdraw. It wasn't like 20x overdraw though, I think the figure was more like 4-5x. Im sure Simon or Kristof know the exact average figure.

As for fablemark, there is a lot of geometry in it, every single object casts a shadow, which is geometry. Its not just that, they do fake soft shadows and actually cast multiple shadow volumes making the polygon count pretty high. Again, how about a comment from Simon or Kristof ;)
 
Slightly OT but I was looking through archive.org for some info and came across this quote( http://web.archive.org/web/19980113145129/www.matrox.com/mgaweb/leadstor/m3d.htm ):

"We got these new drivers for the PowerVR PCX2 and they rock. The PCX2 is easily the no-brainer purchase for GLQuake for the price. The triple buffering really makes a difference, and their performance is really really good, we're getting 29.1 fps WITH 24-BIT COLOR. No weird dithering artifacts. No tearing," says Brian Hook, id Software.

I wonder if id software will be as excited about PowerVR series 5 (or whatever it gets called)...
 
bystander said:
Slightly OT but I was looking through archive.org for some info and came across this quote( http://web.archive.org/web/19980113145129/www.matrox.com/mgaweb/leadstor/m3d.htm ):

"We got these new drivers for the PowerVR PCX2 and they rock. The PCX2 is easily the no-brainer purchase for GLQuake for the price. The triple buffering really makes a difference, and their performance is really really good, we're getting 29.1 fps WITH 24-BIT COLOR. No weird dithering artifacts. No tearing," says Brian Hook, id Software.

I wonder if id software will be as excited about PowerVR series 5 (or whatever it gets called)...

They would do when they see how fast it can do carmacks stencil shadowing.

I guess in thems pcx-2 days they musta had the most top of the range pc, like 233Mhz or something coz the PCX2 was gash without hodloads of CPU. hell I could even play Unreal on it in 800x600 with good FPS when I got my celeron 333
 
Back
Top