Outstanding performance of the NV40 = old school 3dfx mojo?

Ailuros said:
Yes IMRs raised their efficiency with a clever combination of bandwidth saving techniques and will continue in doing so, yet that doesn't mean that PowerVR has been sitting idle in terms of development either or that they've left problematic cases unsolved, or that they haven't any advantages at all. They just have to get such a design out the door and that preferably on time to prove to all the naysayers that TBDR can very well be an alternative.

I'm sure PVR haven't been sitting on their hands doing nothing Ailuros. However, how much of a benefit are they going to have over IMR's that employ excellent HSR techniques? Perhaps this is why we haven't seen any Gigapixel tech used, maybe it provides minimal/no speed benefit over traditional designs. Like you have said, I think we really need to wait until someone gets a design out that uses TBDR and is aimed at the high end before we can decide which one is better. I would still love to see that Gigapixel tech used though!

Elroy (dreaming of a DX9/10 capable part employing Gigapixel tech since 1902 :))
 
DaveBaumann said:
elroy said:
Elroy (dreaming of a DX9/10 capable part employing Gigapixel tech since 1902 :))

:!:

You old Fool you. :?:

Glad to see someone got it. I should have guessed you would be the first! I was a long time lurker before everyone buggered off.
 
Scali said:
How about the dreamcast . Clocked at a 100mhz with a 1x1 pipline design putting out beautifull graphics that in its 2 years of dev time looked as good as games with 2 years of dev time on the newer ps2 ?

The DreamCast has a giant advantage over PowerVR-chips in PCs, and that is the API.
PCs need to work with OpenGL and/or Direct3D. These aren't the ideal APIs for the PowerVR chips. The DreamCast allowed you to push the data in an optimal way for the PowerVR chip (opaque polys first, then translucent, alphatested, etc).
Not really. That's the way the HW rendered them (well, you've actually got the order wrong but). These separate lists were maintained by the HW, and so the application software could send objects in any order. Of course, the API made sure it drew the developers attention to this.
 
I'd say half-and-half. I battled to try to get an 'immediate mode' style driver working on a TBDR (the Oak Warp 5, no less) and it was not pleasant.

The biggest advantage is that you have knowledge of when you're going to overflow your scene buffer (the Warp 5 didn't have any fallback: overflow the buffer, lose the primitives). How much scene buffer to allocate in an immediate rendering scenario is a nightmare question. That they were double buffered made it worse.
 
Scali said:
Well lets see the powervr version of the neon 250 had to work with windows ce. Which had a version of dx on it. So that is thrown out right htere. Then it had to work with segas os .

Just because there's DirectX doesn't mean it's what people use, or that it's comparable to the PC-version of DirectX
AFAIR, they were similar.
I'm quite sure that most developers did not use DirectX on the DreamCast, and certainly not ported PC DirectX code.
Console-software generally gets the most from the hardware because it is custom-made for the hardware, unlike PC software, which has all kinds of compatibility/abstraction layers.
There may be no operating system layers, but the Sega API was similar in level to DX or OGl.

The reason why that pc part did not do well is because the dc took most of their time and effort.

Not at all. It's pretty much the same part, it didn't require much extra effort other than developing PC drivers.
It's not the same part by a long shot.

No we haven't. Kyro lacked many features that were available on conventional renderers, such as programmable shaders or hardware T&L.
{Splutter}Programmable shaders?!!. When Kryo was released? I think not.

The Kyro was also manufactured with an older process than its competitors. Therefore Kyro was not top-of-the-line, and you cannot make a fair comparison against top-of-the-line cards.
What on earth does that prove? Kyro was a small (in transistors) chip that didn't need huge data busses (== large numbers of pins etc). It didn't need a more expensive process in order to produce a cost-effective device,
The DreamCast was even less advanced than the Kyro, so it proves even less about cutting-edge performance of tile rendering.
Yet it out-performed and out-featured the massive 3-chip Voodoo2 system that was much more expensive. Obviously that means it was not cutting edge.
 
Chalnoth said:
No, increasing the size of your quads would obviously make them quads no longer.
Depends if "quad" is short for "quadruplet" or "quadrilateral".
 
However, how much of a benefit are they going to have over IMR's that employ excellent HSR techniques?

I hope you're not asking for persentages are you? I'm in no position to know, nor guestimate.

Early Z gives optimal results in early-Z optimized applications; otherwise all other bandwidth saving techniques are mostly optimized for front to back. Granted the persentage of back to front or random order routines should be way lower overall, yet it still gives a higher and more predictable data rejection efficiency to a TBDR.

(Simon usually says that HSR is as old as the Z-buffer for the record :p )

I'm not so sure why you're concentrating that much on HSR anyway; it saves quite a lot of fill-rate on a TBDR; otherwise in the majority of cases game engines take care of HSR already via BSPs, portal rendering etc etc.

From the top of my head here are a couple of spots I personally think a TBDR could have advantages:

  • Stencil operations come with no pixel shader cost.
  • Supersampling is in relative terms bandwidth "free", while Multisampling is both fill-rate and bandwidth "free". The buffer consumption for high amounts of samples is times lower than on an IMR.
  • Hidden pixels do not get textured at all; the pixel shader isn't processing them either.
  • You are saving bandwidth and to that by a lot on a TBDR with full floating point render targets; write out a 128bit render target (while doing MRTs for instance) to memory only to overwrite it later on and the bandwidth waste isn't miniscule.
  • Pixel and vertex processing is entirely de-coupled on a TBDR; meaning that VS and PS units can work on different scenes f.e.

etc etc.

Things that would worry me on the other hand would be f.e. shader task switching, vertex data rejection and the likes.

Elroy (dreaming of a DX9/10 capable part employing Gigapixel tech since 1902 )

How high are your chances though to see anything in terms of TBDR from NVIDIA? (doh reality check GP no longer exists). Rather damn close to zero wouldn't you say?

If there's a chance to see a high end TBDR of the kind then IMG/PowerVR would be your safest and only bet (mark "safest" with the usual IMG conditionals).

***edit:

Interesting B3D articles (in case you haven't read them) are:

http://www.beyond3d.com/articles/deflight/

http://www.beyond3d.com/articles/directxnext/

I personally don't agree with Ilfirin's conclusion in the latter, but that's a chapter of it's own.
 
Ailuros said:
How high are your chances though to see anything in terms of TBDR from NVIDIA? (doh reality check GP no longer exists). Rather damn close to zero wouldn't you say?
Considering NVIDIA already claimed to certain people they expect the NV60 to use a 512-bit bus, the chances of a TBDR from them before the NV70 are, well, nil. And I just don't see them making a major turn then. So I'd say the chances of a TBDR from NVIDIA in the next decade are completely and utterly nil. But that's just IMO.

Uttar
 
Uttar said:
Ailuros said:
How high are your chances though to see anything in terms of TBDR from NVIDIA? (doh reality check GP no longer exists). Rather damn close to zero wouldn't you say?
Considering NVIDIA already claimed to certain people they expect the NV60 to use a 512-bit bus, the chances of a TBDR from them before the NV70 are, well, nil. And I just don't see them making a major turn then. So I'd say the chances of a TBDR from NVIDIA in the next decade are completely and utterly nil. But that's just IMO.

Uttar

http://www.beyond3d.com/forum/viewtopic.php?p=135494&highlight=512bit#135494

Don't blame me if I read too much.
 
Ailuros said:
(Simon usually says that HSR is as old as the Z-buffer for the record :p )
I think it was Warnock who coined the phrase "hidden surface removal" (as opposed to the earlier "hidden line removal"), in which case it comes from "Warnock's (HSR) Algorithm" which may be older than the Z-buffer.
I'm not so sure why you're concentrating that much on HSR anyway; it saves quite a lot of fill-rate on a TBDR; otherwise in the majority of cases game engines take care of HSR already via BSPs, portal rendering etc etc.
Portals, yes perhaps, but surely BSPs are there for fast off-screen rejection and/or sorting of polygons.
 
Ailuros said:
However, how much of a benefit are they going to have over IMR's that employ excellent HSR techniques?

I hope you're not asking for persentages are you? I'm in no position to know, nor guestimate.

Was asking for a guestimate. Likely this is going to be hard to do until PVR S5 is released, so that we can do a comparison with the latest and greatest.

I'm not so sure why you're concentrating that much on HSR anyway; it saves quite a lot of fill-rate on a TBDR; otherwise in the majority of cases game engines take care of HSR already via BSPs, portal rendering etc etc.

I thought the idea of HSR was to acheive a similar result to that of a TBDR, using an IMR. I guess HSR will only acheive part of the benefits of a TBDR though (some of which you have outlined below)?

From the top of my head here are a couple of spots I personally think a TBDR could have advantages:

  • Stencil operations come with no pixel shader cost.
  • Supersampling is in relative terms bandwidth "free", while Multisampling is both fill-rate and bandwidth "free". The buffer consumption for high amounts of samples is times lower than on an IMR.
  • Hidden pixels do not get textured at all; the pixel shader isn't processing them either.
  • You are saving bandwidth and to that by a lot on a TBDR with full floating point render targets; write out a 128bit render target (while doing MRTs for instance) to memory only to overwrite it later on and the bandwidth waste isn't miniscule.
  • Pixel and vertex processing is entirely de-coupled on a TBDR; meaning that VS and PS units can work on different scenes f.e.

etc etc.

Things that would worry me on the other hand would be f.e. shader task switching, vertex data rejection and the likes.

Thanks. This is the sort of thing I was after, though I don't completely understand all of it.

How high are your chances though to see anything in terms of TBDR from NVIDIA? (doh reality check GP no longer exists). Rather damn close to zero wouldn't you say?

Yep, I think you're right. Always nice to dream though! (and yes I realise GP no longer exists, but the tech does.)

If there's a chance to see a high end TBDR of the kind then IMG/PowerVR would be your safest and only bet (mark "safest" with the usual IMG conditionals).

That's the problem, isn't it? Here's hoping that we'll see PVR S5 vs NV4x and R4xx.

***edit:

Interesting B3D articles (in case you haven't read them) are:

http://www.beyond3d.com/articles/deflight/

http://www.beyond3d.com/articles/directxnext/

I personally don't agree with Ilfirin's conclusion in the latter, but that's a chapter of it's own.

Thanks for these. I will try to find time to have a read later on.
 
elroy said:
I thought the idea of HSR was to acheive a similar result to that of a TBDR, using an IMR. I guess HSR will only acheive part of the benefits of a TBDR though (some of which you have outlined below)?

Arggh. One day I shall write a definition on my web page of what the term HSR means and what it doesn't automatically imply.

FWIW the Z-buffer is already an HSR.
 
There may be no operating system layers, but the Sega API was similar in level to DX or OGl.

My point was that I've seen developers who pushed the data to the hardware directly, skipping any rendering API and primitive-sorting code etc. This doesn't work on PCs obviously, because you can't afford to make hardware-specific 3d applications, hence you use the API there. And that API is in no way optimized for the specific PowerVR architecture.

{Splutter}Programmable shaders?!!. When Kryo was released? I think not.

Firstly, the release-date of the Kyro is not important in the original argument (how will cutting-edge TBDRs perform compared to conventional devices?), secondly, at least the Kyro II was released at about the same time as the GF3 if I'm not mistaken, so it was indeed in the programmable shader era. And the original Kyro was released at around the time of the original GeForce, I believe, so that was in the hardware T&L era.

What on earth does that prove? Kyro was a small (in transistors) chip that didn't need huge data busses (== large numbers of pins etc). It didn't need a more expensive process in order to produce a cost-effective device
...
Yet it out-performed and out-featured the massive 3-chip Voodoo2 system that was much more expensive. Obviously that means it was not cutting edge.

The discussion is not about whether PowerVR made decent products in the past, or if it was cutting edge THEN... The discussion is if TBDRs can be cutting-edge NOW. Kyro does not prove anything there, since it is about as out-dated as the original GeForce, which is equally bad in proving how well conventional renderers handle things like overdraw today, for example.
 
Ailuros said:
From the top of my head here are a couple of spots I personally think a TBDR could have advantages:

  • Stencil operations come with no pixel shader cost.
  • Supersampling is in relative terms bandwidth "free", while Multisampling is both fill-rate and bandwidth "free". The buffer consumption for high amounts of samples is times lower than on an IMR.
  • Hidden pixels do not get textured at all; the pixel shader isn't processing them either.
  • You are saving bandwidth and to that by a lot on a TBDR with full floating point render targets; write out a 128bit render target (while doing MRTs for instance) to memory only to overwrite it later on and the bandwidth waste isn't miniscule.
  • Pixel and vertex processing is entirely de-coupled on a TBDR; meaning that VS and PS units can work on different scenes f.e.
  • Stencil operations can be quite fast on IMR's, too. See, for example, how they are accelerated on the NV3x. Just keep in mind that stencil shadows shouldn't be a lasting method for shadowing.
  • While supersampling will be more efficient in memory bandwidth on a TBDR, multisampling will be very similar in efficiency, since today we have z- and frame-buffer compression techniques.
  • Hidden pixels don't need to be touched at all on current IMR's, either, given the various HSR techniques available coupled with a z-only pass (or even front-to-back ordering helps significantly).
  • IMR's also don't need to overwrite values. See above.
  • Why is decoupling of pixel and vertex processing a good thing?

Anyway, I guess what I'm saying is that all of the inherent benefits of a TBDR are either solved on current IMR's, or are partially solved (and may be completely solved in the future).
 
Simon F said:
elroy said:
I thought the idea of HSR was to acheive a similar result to that of a TBDR, using an IMR. I guess HSR will only acheive part of the benefits of a TBDR though (some of which you have outlined below)?

Arggh. One day I shall write a definition on my web page of what the term HSR means and what it doesn't automatically imply.

FWIW the Z-buffer is already an HSR.

Sorry Simon, you'll have to pardon my ignorance. I really need to go and do some reading on both forms of rendering and current gen stuff. Other stuff is taking a higher priority atm.
 
  • Stencil operations can be quite fast on IMR's, too. See, for example, how they are accelerated on the NV3x. Just keep in mind that stencil shadows shouldn't be a lasting method for shadowing.

I can see a NV3x being merely twice as fast as a K2 in Fablemark, yet then again the latter has already as many Z/stencil units as NV40 should have.

Agreed on the stencil shadowing having probably a short life,; NV40 is going to come most likely bundled with D3 and the very same game engine is going to gain quite a few licenses.

Stenciling and pixel shading in parallel.


  • While supersampling will be more efficient in memory bandwidth on a TBDR, multisampling will be very similar in efficiency, since today we have z- and frame-buffer compression techniques.

There's a chance that IHVs will leave in the future AA to the ISVs for which I don't disagree with. MSAA similar in efficiency up to how many samples exactly? 4x maybe 8x in a very generous case?

Over 100MB of buffer consumption alone for 6x MSAA.

  • Hidden pixels don't need to be touched at all on current IMR's, either, given the various HSR techniques available coupled with a z-only pass (or even front-to-back ordering helps significantly).

Presupposition there is a z-only pass.

  • IMR's also don't need to overwrite values. See above.

No they never actually do. Who you're kidding anyway?


  • Why is decoupling of pixel and vertex processing a good thing?

Why is it necessary on the other hand to unify grids in future APIs? Maybe just maybe because there has to be a more efficient way to handle very long shaders coupled with very short shaders at the same time?

Anyway, I guess what I'm saying is that all of the inherent benefits of a TBDR are either solved on current IMR's, or are partially solved (and may be completely solved in the future).

Sad part is that without any real hardware it is and will remain a moot point. It isn't either that IMRs are supposedly near to "perfection"; it's just one particular brand.
 
I personally don't see TBDR being used in ANY future nvidia desktop products. I think mobile solutions could move to it in the future (and by future I mean 5+ years) but I still see it as unlikely.

I am still amazed that nvidia hasn't used the gigapixel tech in PDA's and other small mobile devices. I see a HUGE market for this in the upcoming years and I've expected to all ready see nvidia push products into this market by now. I think this (gigapixel TBDR) was the biggest benefit that nvidia got from the 3dfx tech.

Other than that, the 3dfx technology "acquisition" was nothing more than nvidia burying a competitor's old technology and gaining some talented employees. I really don't see much use for most of the other tech that nvidia got.
 
Back
Top