Outstanding performance of the NV40 = old school 3dfx mojo?

Joe DeFuria · Apr 6, 2004

RussSchultz said:
I have no idea what was being told to NVIDIA engineers by TSMC, though I can guess it was along the lines of 'of course its going to work'.

Did ATI ask TSMC if 0.13 low-k would be ready for them too? And did TSMC tell them something different?

RussSchultz · Apr 6, 2004

Joe DeFuria said:
RussSchultz said:

I have no idea what was being told to NVIDIA engineers by TSMC, though I can guess it was along the lines of 'of course its going to work'.

Click to expand...

Did ATI ask TSMC if 0.13 low-k would be ready for them too? And did TSMC tell them something different?

Magic 8 ball says "ask again later"

Joe DeFuria · Apr 6, 2004

RussSchultz said:
Joe DeFuria said:

RussSchultz said:

I have no idea what was being told to NVIDIA engineers by TSMC, though I can guess it was along the lines of 'of course its going to work'.

Click to expand...

Did ATI ask TSMC if 0.13 low-k would be ready for them too? And did TSMC tell them something different?

Click to expand...

Magic 8 ball says "ask again later"

The point is, unless you think TSMC told ATI something different than they told nVidia, I'd have to say it's ultimately nVidia's "fault."

John Reynolds · Apr 6, 2004

I thought (but could be mistaken) Dave had posted something awhile back that suggested TSMC had indeed warned against using the process with overly complex designs.

Sage · Apr 6, 2004

as for geometry caching being a problem-

what about with a PPP? Try using a heiarchial subdivision surface and suddenly you don't have anywhere near the bandwidth requirements for geometry. It would also be useful for LOD, you wouldn't even have to load all of the levels of detail, dependant on distance. Of course, it would still require a rather large on-chip buffer but the bandwidth requirements would be greatly reduced. And, remember that a TBDR can get comparable speed with less pipelines and lower clock speeds. That means that you have plenty of transistors to use on that buffer.

Bottom line is that both IMR and TBDR valid approaches and both are capable of achieving near-photorealism eventually. They have different hurdles to jump through to get there, and which one gets there first merely depends on the resources put into it. I still prefer TBDRs because they are cheaper to produce, but that could possibly change at some point.

Tagrineth · Apr 6, 2004

Chalnoth said:
1. With high geometry densities, the size of the scene buffer will become greater than the size of the z-buffer, and as it grows further its bandwidth demands will begin to approach those of the z-buffer (particularly for places where one triangle crosses several tiles, such as the case of high anisotropy or cylinder-like objects). I think we are approaching the geometry densities where this will begin to be a problem for deferred rendering, and there is the further drawback of not being able to scale geometry in the majority of cases with current technology, such that unlike the z-buffer, this performance impact is not scalable by changing user settings.

Shrink the tiles, and expand your on-chip buffer space.

Boom, problem solved.

Evildeus · Apr 6, 2004

http://www.hardwareanalysis.com/content/article/1703/

During the briefing Nvidia did not shy away from getting down to detail about the NV40' features, of which some reminded us of 3dfxâ€™ Rampage

T2k · Apr 6, 2004

Evildeus said:
http://www.hardwareanalysis.com/content/article/1703/

During the briefing Nvidia did not shy away from getting down to detail about the NV40' features, of which some reminded us of 3dfxâ€™ Rampage

Click to expand...

This is the saddest, crappiest piece of yakyakyak recently. This Sander Sassen guy doesn't say ANY information, this whole piece of something (WTF is this? Not news, not article, lacks ANY info...) is a 100% BSing, a very sad try to catch your attention for 25 seconds...

These kinda , hmm, 'things' always remind me some article about defecation of the Internet...

Evildeus · Apr 6, 2004

Daily columm, what information are you waiting for? NDA is still there

Geeforcer · Apr 6, 2004

"Designed by former 3dfx personnel" is not the same as "faithful reproduction of Rampage". But, whatever.

Sage · Apr 6, 2004

Geeforcer said:
"Designed by former 3dfx personnel" is not the same as "faithful reproduction of Rampage". But, whatever.

yes, I think it's silly to act as though anyone who was a former 3dfx engineer turns everything they touch into gold. However, every company has their own major ideas as far as design goes and the hope is that some of the best assets of 3dfx's future-looking idea's and method of thinking will be put to use in current chips. Of course, that doesn't mean that it will blow everything else away; it's just people being sentimental about technology. I, for one, would be very sad to see everything good that came out of 3dfx left for dead because they had some really great things going on. We just need to remember not to discount what other companies had going on as well.

Ailuros · Apr 6, 2004

Tagrineth said:
Chalnoth said:

1. With high geometry densities, the size of the scene buffer will become greater than the size of the z-buffer, and as it grows further its bandwidth demands will begin to approach those of the z-buffer (particularly for places where one triangle crosses several tiles, such as the case of high anisotropy or cylinder-like objects). I think we are approaching the geometry densities where this will begin to be a problem for deferred rendering, and there is the further drawback of not being able to scale geometry in the majority of cases with current technology, such that unlike the z-buffer, this performance impact is not scalable by changing user settings.

Click to expand...

Shrink the tiles, and expand your on-chip buffer space.

Boom, problem solved.

I don't think it's that simple. Sage made a couple of interesting points; besides those even if I'd assume a 10% of the framebuffer on a =/>256MB ram packed board it's peanuts. It easily could balance out with a usual IMRs buffer consumptions for anti-aliasing alone.

This line of argument makes no sense simply because IF they could have clocked the Kyro 2 as fast as their competition, they would have. Sure you can argue the "what if" scenario but have you thought that perhaps there is a reason WHY you don't see a TBR clocked as fast as its competition?

KYRO2 was clocked equal to a GF2 MX and had a similar launch MSRP too. Wherever the K2 showed an advantage in performance it was a welcomed bonus for the consumer, yet I don't recall it being set to be a competitor with high end sollutions.

Ailuros · Apr 7, 2004

Sage said:
Geeforcer said:

"Designed by former 3dfx personnel" is not the same as "faithful reproduction of Rampage". But, whatever.

Click to expand...

yes, I think it's silly to act as though anyone who was a former 3dfx engineer turns everything they touch into gold. However, every company has their own major ideas as far as design goes and the hope is that some of the best assets of 3dfx's future-looking idea's and method of thinking will be put to use in current chips. Of course, that doesn't mean that it will blow everything else away; it's just people being sentimental about technology. I, for one, would be very sad to see everything good that came out of 3dfx left for dead because they had some really great things going on. We just need to remember not to discount what other companies had going on as well.

This is truly a colourful thread

Anyway NVIDIA took leafs out of the former 3dfx technology books since the NV25; more than many actually realise. I never really expected anything else or more than that and since 2000. Common sense.

KimB · Apr 7, 2004

Joe DeFuria said:
RussSchultz said:

I have no idea what was being told to NVIDIA engineers by TSMC, though I can guess it was along the lines of 'of course its going to work'.

Click to expand...

Did ATI ask TSMC if 0.13 low-k would be ready for them too? And did TSMC tell them something different?

My guess is that the NV30 was originally slated for release in the Spring of that year, and so nVidia did get told something different, as they were given the different projection of capability at a different time.

elroy · Apr 7, 2004

Ailuros said:
The more complex graphics get and thus the hardware to support it, the more new positive or negative aspects we (laymen at least) see evolving in accelerators and that irrelevant whether an accelerator defers it's rendering or not.

Neither immediate mode rendering is an absolute eulogy IMHO, nor tile based deferred rendering is an absolute panacea or the whole thing vice versa. Both approaches have clear advantages and disadvantages, the point in case is where either/or exatcly balances out so that it can be considered as the superior sollution.

Scali has made in relative terms a quite valid point; there's no way one can judge an approach with only value sollutions so far. TBDR and in extension PowerVR has to prove the superiority of their approach they're claiming and that can happen only with a full blown high end design.

elroy,

....we are seeing memory bandwidth increase dramatically (~50 GB for NV40/R420),

Click to expand...

Don't bet anything on those kind of bandwidth rates, rather 1/3rd less than that. What for do those need in today's applications bandwidth more? Would you agree if I'd say that you need it mostly for ultra high resolutions with high sample anti-aliasing?

Yes, at this point.

It should be common knowledge by now that TBDRs have a clear advantage in terms of bandwidth and memory footprint when it comes to high sample anti-aliasing.

Agreed. But how many samples are we talking here? 4x FSAA was supposed to be "free" on NV3x, and from some leaked info, it looks like it will be on NV40.

Would a TBDR still provide the huge advantages that it used to with all of this current tech? I know it's hard to gauge because there isn't a TBDR on the market, but I'm sure a few of you in the know could hazard a guess.

Click to expand...

I never thought of a clear superiority of TBDR if you pull all things through an average; let's just say that I prefer it as an approach, because it has it's advantages exactly in the departments where my primary interests lie.

Understandable. Nice post Ailuros!

KimB · Apr 7, 2004

Tagrineth said:
Shrink the tiles, and expand your on-chip buffer space.

Boom, problem solved.

First of all, shrinking the tiles will increase memory bandwidth drain, as more triangles will cover more than one tile. I don't expect, however, that tile size will be an issue with high geometry densities.

Secondly, I hope you meant expand scene buffer space, as it's impossible to expand on-chip buffer space (without a new chip). Anyway, increasing the amount of memory dedicated to the scene buffer dynamically will be very expensive for performance. For optimal performance, the amount of memory allocated to the scene buffer must be larger than the worst-case scenario. If that amount is ever exceeded, there will be a massive performance hit, a performance hit that will depend on the particular method of getting around the overflow.

This is why I like immediate-mode rendering more. The amount of memory required for each frame is a known quantity.

elroy · Apr 7, 2004

Mariner said:
Gawd - not all this again. Can somebody search back a few months to find another thread in which all the arguments for/against deferred/immediate mode rendering have already been run through? They tend to come around two or three times a year, IIRC. 8)

I know that it has come up before, actually I'm quite sure that I might have asked a similar question last time. Problem is, my question never gets nicely answered! If someone would like to post a nice summary post, or send a link to an article that nicely describes the advantages and disadvantages of both methods, then I'd be happy! Maybe I can start:

TBDR:

Advantages

Less work to do
Lower transistor count
Lower power consumption
FSAA for "free"

Disadvantages

? Problems with high geometry counts (is this an invalid point?)
Not proven at a high end level

IMR

Advantages

Tried and true - we know it works

Disadvantages (the ones I have listed are all compared to a TBDR)

Requires higher memory bandwidth
Requires more pipelines, and therefore transistors
Produces more heat

KimB · Apr 7, 2004

elroy said:
Agreed. But how many samples are we talking here? 4x FSAA was supposed to be "free" on NV3x, and from some leaked info, it looks like it will be on NV40.

Anti-aliasing will never be totally free. I doubt the performance hit for 4x FSAA will change much between the NV3x and NV4x.

Tagrineth · Apr 7, 2004

Chalnoth said:
First of all, shrinking the tiles will increase memory bandwidth drain, as more triangles will cover more than one tile. I don't expect, however, that tile size will be an issue with high geometry densities.

Okay...? Compare that to the bandwidth you're saving by having zero opaque overdraw and an on-chip Z-buffer.

Secondly, I hope you meant expand scene buffer space, as it's impossible to expand on-chip buffer space (without a new chip). Anyway, increasing the amount of memory dedicated to the scene buffer dynamically will be very expensive for performance. For optimal performance, the amount of memory allocated to the scene buffer must be larger than the worst-case scenario. If that amount is ever exceeded, there will be a massive performance hit, a performance hit that will depend on the particular method of getting around the overflow.

Without a new chip? Fuckin' DUH. And I haven't seen Kyro II reach binning limits yet... have you?

And I don't mean dynamically, I mean set it pretty big in the first place. Keep in mind PowerVR cores don't even need double buffering, and don't need to store the Z-buffer, so they can afford to have a huge scene buffer.

Sage · Apr 7, 2004

Chalnoth said:
Secondly, I hope you meant expand scene buffer space, as it's impossible to expand on-chip buffer space (without a new chip).

uhh isn't that what we're talking about? new chips?

For optimal performance, the amount of memory allocated to the scene buffer must be larger than the worst-case scenario. If that amount is ever exceeded, there will be a massive performance hit, a performance hit that will depend on the particular method of getting around the overflow.

good point. then it certainly needs some work done to find an efficient method of doing this. of course, you wouldn't be so naive to think that IMR's havn't had huge speedbumps themselves to overcome, have you?

It's a very simple thing to understand- you trade one set of problems for a different set. Neither one is insurmountable, and both can be conqured with enough R&D. IMRs have a huge advantage atm because they have had much much much more R&D put into them, but if you turned the tables IMR's would certainly seem impossible at first glance.

edit:
oh yes, and let's not forget that IMR's are going to hit a geometry limit as well as imposed by memory space and bandwidth. higher-order surfaces are a must for the future, and they will effectively remove the limitation of both IMR's and TBDR's if done properly. Again, I direct your attention to PPP and heiarchial sub-d's.

Outstanding performance of the NV40 = old school 3dfx mojo?

Joe DeFuria

RussSchultz

Professional Malcontent

Joe DeFuria

John Reynolds

Ecce homo

Sage

13 short of a dozen

Tagrineth

murr

Evildeus

T2k

Evildeus

Geeforcer

Harmlessly Evil

Sage

13 short of a dozen

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

KimB

elroy

KimB

elroy

KimB

Tagrineth

murr

Sage

13 short of a dozen

Similar threads