GeForce FX Digit-Life tidbits

alexsok · Dec 4, 2002

Digit-Life is the article where I gathered a bit more info:
http://www.digit-life.com/articles2/gffx/index.html

1.The highest texture sampling and filtering rate is up to 8 per clock.
2.Number of pixel shader instructions executed per clock cycle: 2 integer and 1 floating-point or 2 (!) texture access instructions. The latter option is possible as during preceding shader's computational operations the texture units could sample texture values with known coordinates beforehand and save them in special temporary registers, which are 16 in all. I.e. the texture units can single out not more than 8 textures per clock but the pixel shader can get up to 16 results per clock.
3.Like the previous generation of the chips, the GeForce FX works with two types of MSAA blocks - 2x diagonal and grid 2x2. The 6xS and 8x AA modes are hybrid modes based on averaging of several base blocks one or another way (pattern).
4.The frame buffer compression works only in the FSAA modes of MSAA, and only on the MSAA blocks level. Hence lossless compression, about 4:1 in the modes with the 4x base MSAA block, and 2:1 for 2x blocks.
5.The chip supports the activity control scheme which controls intensity of operation of a cooling system depending on the load and heating of the chip.
6.The chip doesn't incorporate DVI or TV-Out controllers, like all earlier top NVIDIA's solutions. Integrated controllers are used mainly in mass products.
7.The mass production of the second revision of the chip, which will be used for production cards, is already about to start.

Now, contrary to what Dave said, it seems that GeForce FX indeed incorporates three vertex processors (that's the same info I heard from other places, so could you check it out again Dave?):

It's interesting that the GeForce FX incorporates three vertex processors according to the number of pixels in a triangle, instead of four like in the ATI's product Besides, in case of dynamic implementation the shaders can take a different number of clock cycles for different vertices, but new vertices are started up simultaneously, i.e. the units that have completed execution of shaders wait for those which haven't to start processing three more vertices at the same time. It's clear that dynamic jumps made NVIDIA use additional transistors. Three processors can be a weak point and a quite balanced solution - we still don't have enough information on a performance of a separate vertex processor per unit of clock speed

Next comes the myth about Displacement mapping and the basic idea of it's implementation on GeForce FX has surfaced here more than once...

Reportedly, the GeForce FX won't support Displacement Maps and hardware tesselation of N-Patches. That is why the DM technology will probably suffer the same fate as the N-Patches - the support is officially provided in applications, but real models developed for it are absent. If NVIDIA's products do not support the DM, the number of applications potentially supporting it can fall down significantly. At present, the N-Patches and DM are not an obligatory requirement for the DX9 compatibility.

Next we have the tidbit I was talking about sometime ago...

It's interesting that NVIDIA managed to realize texture fetching commands in a pixel processor without any delays. Even dependent texture fetching going one after another are fulfilled at a clock. This can give the GeForce FX a considerable advantage over the R300 in case of complex shaders.

However, the result remains pretty much the same:

R300 & NV30:
Bilinear filtering 8 textures per clock 8 textures per clock
Trilinear filtering 8 textures per clock 8 textures per clock
Minimal aniso 8 textures per clock 8 textures per clock

However, the clock speed of the GeForce FX is higher. But real effectiveness of balancing the chip and its performance is yet to be studied.

Dave Baumann · Dec 4, 2002

Haven't we seen all this before? None of that seems any newer.

The information on the vertex shader side comes directly from NVIDIA's Geoff Ballews mouth, and he was one of the personell on the NV30 project, so I'll defer to his knowledge rather than anything DL says. Other previews have mentioned similar things to Geoff's statement about the Shader being a 'P10 style array' as well.

Hellbinder · Dec 4, 2002

First, I trust Digit-Lifes Info about as much as i could track down Bigfoot this afternoon.

However...

1. Notice according to them Nvidias Frame Bufffer compression is exactly the same in function to ATi's. It is only applied during MSAA. Thus if true every single web site that has done a preview has provided FALSE information and a FALSE sense of Nv30's Advantages. Could they all have been so wrong without Nvidia Being the Cause???

2. Im sorry but Nvidia is not the be all end all of what Developers will and wont support. They have a lot of influence now.. but Times are changing. It is simply NOT RIGHT for them to start slandering a Great Tech like Displacement Mapping, just becuase they did not have the forethought to include it...

Why should the Game market be the Slave of the Whims and *FAILURES* of Nvidia...

Furethermore.. ATi now has 5 cards on the market at nearly all price levels that supports both Displacement Mapping AND N-patches. Parhelia also supports its. so 6 cards to 1, not counting the comming R350... which will make it 7:1... And developers are going to *drop it* becuase of one freaking 500$ card?????

God i hope that the Developer Community has not become such whoring Slaves....

psurge · Dec 4, 2002

DaveB,

perhaps the "3 vertex shader" statement simply means that 3 vertices are processed simultaneously?

What is the throughput of the vertex engine anyway (in verts/clock), assuming nothing more than a transform?

Serge

MDolenc · Dec 4, 2002

Displacement mapping as in vs 3.0 is great tech. Displacement mapping as we can see on Parhelia is good tech. But displacment mapping on Radeon 9700 is not all that good (since it only supports presampled).

alexsok · Dec 4, 2002

DaveBaumann said:
Haven't we seen all this before? None of that seems any newer.

The information on the vertex shader side comes directly from NVIDIA's Geoff Ballews mouth, and he was one of the personell on the NV30 project, so I'll defer to his knowledge rather than anything DL says. Other previews have mentioned similar things to Geoff's statement about the Shader being a 'P10 style array' as well.

Dave, they've been in contact with NVIDIA as well (read the article). I don't think there is any reason not to believe them...

Besides, there are some new tidbits that I haven't seen anywhere else...

Nappe1 · Dec 4, 2002

MDolenc said:
Displacement mapping as in vs 3.0 is great tech. Displacement mapping as we can see on Parhelia is good tech. But displacment mapping on Radeon 9700 is not all that good (since it only supports presampled).

if we ever going to see Parhelia Displacement Mapping on action. more than one source has confirmed that Matrox is about anoounce deceasing drivers gaming support to Parhelia.

I surely hope I am wrong.

Dave Baumann · Dec 4, 2002

psurge said:
perhaps the "3 vertex shader" statement simply means that 3 vertices are processed simultaneously?

I don't think thats the way DL intended it - look at the block diagram at the bottom of the page.

alexsok said:
Dave, they've been in contact with NVIDIA as well (read the article). I don't think there is any reason not to believe them...

I'm sure they have, but did they actually ask that specific question and get that specific answer, or did they assume a configuration based on the developer info given out beforehand?

I asked a direct question of NVIDIA and got a specific answer - theres littel room to argue there, however do you know what DL based that information on?

alexsok · Dec 4, 2002

I'm sure they have, but did they actually ask that specific question and get that specific answer, or did they assume a configuration based on the developer info given out beforehand?

Yes they did ask that specific question and got that specific reply.
Look, I'm not implying on anything, I'm just saying that both you and the guys over there asked the same question from different NVIDIA guys and you both received contradictive replies... i guess it all boils down to who knows more about GeForce FX from these guys?

psurge · Dec 4, 2002

Their block diagram looks like a pile.

They have 4 boxes labeled as "3 vertex processors", connected directly to the pixel processors, with no rasterization stage (except maybe something they call "Tile HSR logic", which is not even connected to the vertex shader output).

Serge

Maverick · Dec 4, 2002

My guess would be that they're both right - in a way. It's a "P10-style array", but the number of elements gives it the same performance as 3 "vertex processors".

Like I say, that's just my guess - no insider info here.

Dave Baumann · Dec 4, 2002

[url=http://www6.tomshardware.com/graphic/02q4/021118/geforcefx-06.html said:
Tomshardware[/url]]Until now, the vertex shader performance was expressed mainly through the number of available shader units (GeForce4 Ti: two/ Radeon 9700 PRO: four). In contrast, the GeForceFX uses a highly programmable floating-point array, which allows for a triangle transformation rate of over 350 Mverts/s. For comparison, the GeForce4 Ti can offer 136 Mverts/s, while the Radeon 9700 PRO achieves about 325.

[url=http://www.anandtech.com/video/showdoc.html?i=1749&p=3 said:
Anandtech[/url]]
Massively Parallel Vertex Shader Engine

Since most of the logic behind the vertex shader engine had to be re-written in order to accommodate the needs of DirectX 9, NVIDIA redesigned the vast majority of the GeForce FX's vertex shader engine from scratch. Whereas the GeForce4 had two parallel vertex shader units, the GeForce FX has a single vertex shader pipeline that has a massively parallel array of floating point processors (somewhat similar to 3DLabs' P10 VPU, although we don't have an idea of how many individual processors are at work in parallel).

The parallel FP vertex processors have their own multithreaded instruction set and are obviously optimized for maximum triangle throughput. NVIDIA claims 375 million triangles per second can be passed through the GeForce FX's vertex shader engine, putting it slightly above that of the Radeon 9700 Pro but also keep in mind that we're dealing with a noticeably higher clocked GPU.

Seem's pretty clear to me - I doubt Geoff Ballew and these other previews just invented that for no specific reason. I'd wager that DL's info was based on the "1.5 times GF4 vertex throughput per cycle" and they reached that conclusion, same as Zephyr thought in the NV30/R300/DX9 comparison article here.

alexsok · Dec 4, 2002

I still stand by my original claim... let's see if I can find out for sure now...

John Reynolds · Dec 4, 2002

alexsok said:
I still stand by my original claim... let's see if I can find out for sure now...

I sure hope it doesn't get halved like other claims.

cellarboy · Dec 4, 2002

John Reynolds said:
alexsok said:

I still stand by my original claim... let's see if I can find out for sure now...

Click to expand...

I sure hope it doesn't get halved like other claims.

Yeah, no kidding!!

Grall · Dec 4, 2002

You guys should stop discussing boring crap like that old 3xVS claim. We KNOW NV30 doesn't have discrete vertex shaders in the sense the R2/300 or GF3/4 has them, so this is clearly bull.

I want more concentration placed on the framebuffer compression only affecting multisampling AA modes bit instead. The way I remember it, Anandtech trumpeted this as one big advantage of NV30 over R300, the latter inferred to not have it at all according to their 'straight-from-Nvidia-horse's-mouth' PR piece. I don't know if this has been remedied or not, or if Anandtech still means to say R300 lacks FB compression altogether.

Wouldn't it be the mother of all ironies if NV30 ALSO lacked non-AA FB compression?

I also think it's interesting XB says 6 and 8xAA are hybrid modes based on *recombination*. Either their English isn't that good (what else is new?

) and they've misunderstood what others have have called these modes being combinations of MS and SS AA, or it might mean NV30 doesn't *actually* sample 6 or 8 points per pixel... Also, seems only 2xAA features a rotated sample pattern, ie: AA capabilities overall seem to be *yawn* as usual with Nvidia.

Pixel shader can either do 2 integer + 1 FP operation OR two texture reads they say. R300 should have little to fear in this regard apart from clock speed difference, as it can do a texture read on every cycle even while performing a maths operation. Or am I wrong? Pick this apart please, you techies!

*G*

Bigus Dickus · Dec 5, 2002

Grall said:
I want more concentration placed on the framebuffer compression only affecting multisampling AA modes bit instead. The way I remember it, Anandtech trumpeted this as one big advantage of NV30 over R300, the latter inferred to not have it at all according to their 'straight-from-Nvidia-horse's-mouth' PR piece. I don't know if this has been remedied or not, or if Anandtech still means to say R300 lacks FB compression altogether.

Wouldn't it be the mother of all ironies if NV30 ALSO lacked non-AA FB compression?

Framebuffer compression isn't going to be particularly useful in non-MSAA modes. Further, framebuffer compression isn't particularly needed in non AA modes.

Now... if the NV30 can't compress SSAA samples (which it probably can't, since they're not very compressible AFAIK) then that is something worth talking about.

It is entirely possible that not only will 8X MS/SS not look as good as 6X RGMS (if the MS portion is 2X2 OGMS and the SS portion is 1X2 OGSS, which is what it sounds like) but it might be slower too, due to the loss of compression savings on the SS portion.

Althornin · Dec 5, 2002

Bigus Dickus said:
Framebuffer compression isn't going to be particularly useful in non-MSAA modes. Further, framebuffer compression isn't particularly needed in non AA modes.

Now... if the NV30 can't compress SSAA samples (which it probably can't, since they're not very compressible AFAIK) then that is something worth talking about.

It is entirely possible that not only will 8X MS/SS not look as good as 6X RGMS (if the MS portion is 2X2 OGMS and the SS portion is 1X2 OGSS, which is what it sounds like) but it might be slower too, due to the loss of compression savings on the SS portion.

Ah, nice point about thier mixed type AA solutions. I hadnt thought that they would loose benefits of color compression there...nice catch.

Ailuros · Dec 5, 2002

First, I trust Digit-Lifes Info about as much as i could track down Bigfoot this afternoon.

Not that radical, but I somewhat agree here with Hellbinder (albeit I consistantly read digit-life).

I don't want to get the thread off track here, but the site's credibility gained another low IMHO, when I read their latest October digest. Anyone curious just check the high end usability ratings for instance.

T2k · Dec 5, 2002

Ailuros said:
First, I trust Digit-Lifes Info about as much as i could track down Bigfoot this afternoon.

Click to expand...

Not that radical, but I somewhat agree here with Hellbinder (albeit I consistantly read digit-life).

I don't want to get the thread off track here, but the site's credibility gained another low IMHO, when I read their latest October digest. Anyone curious just check the high end usability ratings for instance.

Well said...

GeForce FX Digit-Life tidbits

alexsok

Dave Baumann

Gamerscore Wh...

Hellbinder

psurge

MDolenc

alexsok

Nappe1

lp0 On Fire!

Dave Baumann

Gamerscore Wh...

alexsok

psurge

Maverick

Dave Baumann

Gamerscore Wh...

alexsok

John Reynolds

Ecce homo

cellarboy

Grall

Invisible Member

Bigus Dickus

Althornin

Senior Lurker

Ailuros

Epsilon plus three

T2k

Similar threads