Outstanding performance of the NV40 = old school 3dfx mojo?

RussSchultz · Apr 6, 2004

JoshMST said:
The more I have discovered about process technology, the more I am finding out that NVIDIA's problem really wasn't TSMC's 130 nm process.

That doesn't jive with the industry gossip I've heard concerning yields for .13u in general.

elroy · Apr 6, 2004

Scali said:
It was essentially tile based rendering, similar to that used by Power VR. Why do more work with more transistors when it isn't required! Basically the Gigapixel stuff gave similar framerates to the competition with less transistors and lower power consumption (if my memory serves me correctly).

Click to expand...

Well, yes, PowerVR and Intel both have used tile rendering technology for years, and I believe that Trident also made a tile renderer. I doubt that Gigapixel is still ahead of these companies, since their technology has not been under development for years, while the others have.
Also, the Gigapixel-stuff is ancient, and is not built to integrate with programmable hardware, so it will probably have to be re-designed almost completely, in order to apply it to modern-day hardware.
Which pretty much leaves only the concept of tile-rendering itself, which is already wellknown.

Not trying to bash, just trying to be realistic. In a world where technology advances so quickly, technology gets old, fast. Anything that NV didn't apply right away, after acquiring it, is pretty much lost and wasted, I'd say.

You could be correct. The Gigapixel stuff was comparable/ahead of the competition when 3dfx acquired them (there's a ppt presentation floating around somewhere) so the tech was good for then. And it was going to be used for Fear, which was to be the next chip out from 3dfx and supported DX8, so you would think that it would be suitable to use for programmable hardware. So you could also be incorrect as well.

I don't really want to relive the ol' 3dfx days. However, I would like to take this idea on a bit of a tangent. As nobie alluded to, how advantageous is a TBDR over an IMR? Both nV and ATi utilise a number of HSR techniques to reduce overdraw, we are seeing memory bandwidth increase dramatically (~50 GB for NV40/R420), and there's still eDRAM that hasn't been used in a PC GPU. Would a TBDR still provide the huge advantages that it used to with all of this current tech? I know it's hard to gauge because there isn't a TBDR on the market, but I'm sure a few of you in the know could hazard a guess.

Scali · Apr 6, 2004

Why would u expect it to compare against a top of the line gpu ?

That's not the point. I'm not expecting it to compete against a top-of-the-line GPU. The point is that this is no proof that tile rendering can compete against a top-of-the-line GPU.

Not bad for a card you could get for sub 150$ at the launch of the geforce 3 .

I'm sure the Kyro II is a fine card, and good value for money (that's why I bought it in the first place), but that is not the issue here.

Well lets see the powervr version of the neon 250 had to work with windows ce. Which had a version of dx on it. So that is thrown out right htere. Then it had to work with segas os .

Just because there's DirectX doesn't mean it's what people use, or that it's comparable to the PC-version of DirectX (not even XBox is comparable). I'm quite sure that most developers did not use DirectX on the DreamCast, and certainly not ported PC DirectX code.
Console-software generally gets the most from the hardware because it is custom-made for the hardware, unlike PC software, which has all kinds of compatibility/abstraction layers.

The reason why that pc part did not do well is because the dc took most of their time and effort.

Not at all. It's pretty much the same part, it didn't require much extra effort other than developing PC drivers.
The biggest problem was that OpenGL and DirectX were not very good ways to get the most from the hardware (enter PowerSGL).
I actually have a PowerVR card in my old ppro200 (an Apocalypse 3Dx), and it can barely play Quake 1 in 640x480 with minigl. I'm quite sure that the card itself is capable of much more, if used properly.

Well in the past we have seen it (kyro)

No we haven't. Kyro lacked many features that were available on conventional renderers, such as programmable shaders or hardware T&L.
The Kyro was also manufactured with an older process than its competitors. Therefore Kyro was not top-of-the-line, and you cannot make a fair comparison against top-of-the-line cards.
The DreamCast was even less advanced than the Kyro, so it proves even less about cutting-edge performance of tile rendering.
It only proves that tile-rendering was a cost-effective and competitive solution for game consoles at the time. The DreamCast can't go up against the XBox though, and there's no replacement, so we don't know if tile-rendering is still a good option for consoles.

and in the moble market we are seeing it right now (mbx ) Or and not only is the mbx more powerfull than the ati and nvidia offerings but it has a much smaller transister count , offers free 2x fsaa and suses much less power .

Yes it's nice on paper, but have you actually seen any of these chips in real life? And have you actually benchmarked them against eachother to see which one performs best, and which one actually gives the longest battery life in practice?

But because of their lack of 3d add in cards doesn't mean they haven't proved to have great tech and leading edge tech in other fields

This however is irrelevant, when we want to know whether or not tile rendering is the best solution for high-end cards in the near future, which is what I am interested in (is the Gigapixel technology worth anything today?).

JoshMST · Apr 6, 2004

RussSchultz said:
JoshMST said:

The more I have discovered about process technology, the more I am finding out that NVIDIA's problem really wasn't TSMC's 130 nm process.

Click to expand...

That doesn't jive with the industry gossip I've heard concerning yields for .13u in general.

Well, getting good yields is one thing, but having a clean process is another. Basically my understanding of clean is that the good products that come off of the line will not show defects (such as void migration and transistor displacement) after x hours of use. A manufacturer can have a good process, but poor yields. Having a bad process means a manufacturer has no yields. Also, from all indications ATI experienced good yields with the original Radeon 9600 Pro (but nothing remarkable). With NVIDIA, it had a very large design that was poorly architected, so their corresponding yields were much lower (especially for the clock speed targets set).

This is mainly speculation, since TSMC does not release this info to the public. ATI and NVIDIA also do not share this information, nor would they really want to!

Scali · Apr 6, 2004

And it was going to be used for Fear, which was to be the next chip out from 3dfx and supported DX8, so you would think that it would be suitable to use for programmable hardware. So you could also be incorrect as well.

Who knows, perhaps the fact that the technology proved too hard to adapt to DX8 is the reason that we never saw the next-gen cards from 3dfx

I don't think there's a problem, personally. After all, PowerVR actually has programmable designs at this moment.
However, we don't know how this technology compares to whatever Gigapixel had at the time.
I suspect that the PowerVR stuff is lightyears ahead though, after all, it's been years since Gigapixel/3dfx worked on the technology, and DX8 is just a faint memory (as are Gigapixel and 3dfx, ironically).

RussSchultz · Apr 6, 2004

All I know is the production engineers in my company recommended against .13u for quite a while because the word on the street was 'bad yield'.

Regardless of the cause (bad manufacturing control, bad standard cell libraries, or bad electrical models of the process), bad yield is bad yield.

Joe DeFuria · Apr 6, 2004

RussSchultz said:
All I know is the production engineers in my company recommended against .13u for quite a while because the word on the street was 'bad yield'.

Regardless of the cause (bad manufacturing control, bad standard cell libraries, or bad electrical models of the process), bad yield is bad yield.

So, what were the production engineers at nVidia saying?

JoshMST · Apr 6, 2004

Joe DeFuria said:
RussSchultz said:

All I know is the production engineers in my company recommended against .13u for quite a while because the word on the street was 'bad yield'.

Regardless of the cause (bad manufacturing control, bad standard cell libraries, or bad electrical models of the process), bad yield is bad yield.

Click to expand...

So, what were the production engineers at nVidia saying?

I would guess, "Wow, thats a really big pill to swallow."

KimB · Apr 6, 2004

Scali said:
On the other hand, deferred rendering can mean that these long shaders are run for much less pixels, since overdraw can be eliminated (which ofcourse is possible with conventional hardware aswell, using a z-only pass first).

The parentheses is my point.

Scali · Apr 6, 2004

The parentheses is my point.

Yes, but one can argue that an optimized implementation in hardware is more efficient than the conventional multipass approach. Which brings us back to the fact that we don't know this, and we won't, until someone builds the thing

Sxotty · Apr 6, 2004

I think it seems clear that the nv30 fiasco was at least in part due to their migrating to .13u, because quite simply as HB said if they used .15u then sure it would be hotter and slower, but perhaps they simply expected more of a boost from using 13u than they got. The r300 was fairly large, and it got to pretty high clock speeds and from looking at the heatsink it sure doesn't seem that it got as hot as the nv30. My point is simply that perhaps they did not get what they expected they would and thus it was more of an issue than planned.

KimB · Apr 6, 2004

The rumors seem to indicate that the biggest problem nVidia faced with the NV30 was that low-k dielectrics weren't ready for use yet. The original design, apparently, was made for a low-k process, and this had to be redone when that wasn't available.

Ailuros · Apr 6, 2004

The more complex graphics get and thus the hardware to support it, the more new positive or negative aspects we (laymen at least) see evolving in accelerators and that irrelevant whether an accelerator defers it's rendering or not.

Neither immediate mode rendering is an absolute eulogy IMHO, nor tile based deferred rendering is an absolute panacea or the whole thing vice versa. Both approaches have clear advantages and disadvantages, the point in case is where either/or exatcly balances out so that it can be considered as the superior sollution.

Scali has made in relative terms a quite valid point; there's no way one can judge an approach with only value sollutions so far. TBDR and in extension PowerVR has to prove the superiority of their approach they're claiming and that can happen only with a full blown high end design.

Chalnoth,

Not necessarily. Tile-based rendering requires caching of the entire scene, and for large geometry densities, that may end up being worse than using an external z-buffer.

It's an ideal scenario for a TBDR, yet not a presupposition.

Then you have to consider that as shaders get longer and longer, the memory savings of deferred rendering will mean less, as most of the time will be spent doing math ops instead of accessing memory.

Shaders might get longer over time but no one guarantees that ALL shaders will be as long as you seem to be implying. Unless we finally move into the unified grid realm of Shaders4.0 I don't see a TBDR having a disadvantage with extremely long and extremely short shaders at the same time. And before you say it a unified grid can be just as much a blessing for all type of architectures, if not having another hidden advantage for a DR.

What if I ask about high precision framebuffers? Found any scenario yet to call the advantage here minor or not yet?

The parentheses is my point.

Your point is what that deferred rendering is a bad idea, but application driven deferred rendering is?

Ok jokes aside I don't see a TBDR having any trouble with Doom3 style rendering. Ok we've got one case of Overdraw covered, what about the rest?

I don't think that deferred shading or lighting was a coincidence that it was brought up, nor do I think that the recent pre-launch demos were a coincidence either.

elroy,

....we are seeing memory bandwidth increase dramatically (~50 GB for NV40/R420),

Don't bet anything on those kind of bandwidth rates, rather 1/3rd less than that. What for do those need in today's applications bandwidth more? Would you agree if I'd say that you need it mostly for ultra high resolutions with high sample anti-aliasing?

It should be common knowledge by now that TBDRs have a clear advantage in terms of bandwidth and memory footprint when it comes to high sample anti-aliasing.

Would a TBDR still provide the huge advantages that it used to with all of this current tech? I know it's hard to gauge because there isn't a TBDR on the market, but I'm sure a few of you in the know could hazard a guess.

I never thought of a clear superiority of TBDR if you pull all things through an average; let's just say that I prefer it as an approach, because it has it's advantages exactly in the departments where my primary interests lie.

KimB · Apr 6, 2004

Well, as I've said previously, my opposition to deferred rendering is based upon this idea of caching the entire scene. I have two primary arguments opposing this approach:

1. With high geometry densities, the size of the scene buffer will become greater than the size of the z-buffer, and as it grows further its bandwidth demands will begin to approach those of the z-buffer (particularly for places where one triangle crosses several tiles, such as the case of high anisotropy or cylinder-like objects). I think we are approaching the geometry densities where this will begin to be a problem for deferred rendering, and there is the further drawback of not being able to scale geometry in the majority of cases with current technology, such that unlike the z-buffer, this performance impact is not scalable by changing user settings.

2. The scene buffer will change in size from frame to frame within a game. This means that the deferred renderer must store a much larger scene buffer than will likely ever be used. If an overflow occurs, the deferred renderer will have a very significant drop in performance.

Basically I feel that immediate mode rendering shows more promise, that its drawbacks can be handled by a combination of smart game programming and smart GPU design. In this spirit, I am certainly not opposed to immediate-mode renderers heading more in the direction of deferred renderers, but I am opposed to the requirement that the entire scene be cached.

Ailuros · Apr 6, 2004

In this spirit, I am certainly not opposed to immediate-mode renderers heading more in the direction of deferred renderers, but I am opposed to the requirement that the entire scene be cached.

As I said it's not a presupposition. Now reverse that theory above where deferred renderers head for corner cases into a hypothetical IMR direction and borders get quickly blured; which is obviously just an example and not a real sollution.

My mind doesn't necessarily work with absolutes; I'm actually in favour of clever combinations as long as it increases efficiency.

Ty · Apr 6, 2004

jvd said:
You mean your kyro 2 with a pipline set up and clock speeds matching that of a tnt 2. Which in some cases was faster than the geforce 2 gts ultra ?

T power vr architecture that matchs the r300 spec for spec would blow the r300 based cards out of the water .

This line of argument makes no sense simply because IF they could have clocked the Kyro 2 as fast as their competition, they would have. Sure you can argue the "what if" scenario but have you thought that perhaps there is a reason WHY you don't see a TBR clocked as fast as its competition?

Mariner · Apr 6, 2004

Gawd - not all this again. Can somebody search back a few months to find another thread in which all the arguments for/against deferred/immediate mode rendering have already been run through? They tend to come around two or three times a year, IIRC. 8)

If only somebody would license a PowerVR Series 5 PC chip! I suppose at least Sega is having a go in the arcade market with this technology so the discussions won't be entirely theoretical in the future...

Dio · Apr 6, 2004

Not only is it an immediate vs. deferred rendering argument, but a .13 was nvidia's biggest mistake argument and a 3dfx were overrated argument. That's most of the quarterly arguments in one thread!

Scali · Apr 6, 2004

On that note I'd like to throw in a "Raytracing vs Triangle rasterizing" debate at this point

RussSchultz · Apr 6, 2004

Joe DeFuria said:
RussSchultz said:

All I know is the production engineers in my company recommended against .13u for quite a while because the word on the street was 'bad yield'.

Regardless of the cause (bad manufacturing control, bad standard cell libraries, or bad electrical models of the process), bad yield is bad yield.

Click to expand...

So, what were the production engineers at nVidia saying?

No earthly clue. We're on the 'tail end' of production, since our goal is cost reduction and not performance.

I have no idea what was being told to NVIDIA engineers by TSMC, though I can guess it was along the lines of 'of course its going to work'.

Outstanding performance of the NV40 = old school 3dfx mojo?

RussSchultz

Professional Malcontent

elroy

Scali

JoshMST

Scali

RussSchultz

Professional Malcontent

Joe DeFuria

JoshMST

KimB

Scali

Sxotty

KimB

Ailuros

Epsilon plus three

KimB

Ailuros

Epsilon plus three

Ty

Roberta E. Lee

Mariner

Dio

Scali

RussSchultz

Professional Malcontent

Similar threads