NVIDIA GF100 & Friends speculation

Why not? It's the most popular setting. Looking for ghosts there methinks.
Perhaps because it is the score in which PhysX accounts the most?

Let's say GF100 has exactly the same framerates as a Cypress XT in "game tests" but 4 times higher CPU tests thanks to PhysX, what would be the results in P and X presets respectively?

The first one would be 20% higher while the second one would be more like 2% higher...
 
The rumoured GF100 numbers are P22000 and X15000. HD 5870 pulls around P17000 and X8500. So in spite of PhysX the advantage based on these numbers would be much higher in Extreme even with the much greater weight given to the graphics score, i.e exactly the opposite of what you said should happen.
 
The rumoured GF100 numbers are P22000 and X15000. HD 5870 pulls around P17000 and X8500. So in spite of PhysX the advantage based on these numbers would be much higher in Extreme even with the much greater weight given to the graphics score, i.e exactly the opposite of what you said should happen.
Don't know where your X15k comes from, but obviously not from the same source as the P22k.

In fact, it's almost impossible, it would mean somewhere around a 20% framerate loss going from 1280x1024 without AA to 1920x1200 with AA and overall heavier shaders. H15k seems way more likely.
 
Don't know where your X15k comes from, but obviously not from the same source as the P22k.

In fact, it's almost impossible, it would mean somewhere around a 20% framerate loss going from 1280x1024 without AA to 1920x1200 with AA and overall heavier shaders. H15k seems way more likely.

I too doubt it but there's a certain limit to FPS when you're at 1280 without AA and that's the CPU limit. If the bench was run on below say 4 GHz, it could happen. btw 22 to 15 is not 20%. It's 35%.
 
I can't work out what you're saying here.

A triangle that is pixel sized (or AA sample-sized if MSAA is on) still needs to update Z.

I'm thinking of, say, a square inch of screen being filled with 1000 triangles. Let's say it's some monster's head. But it's hidden behind a corner. Might as well cull those triangles before they're setup - or, as part of the setup process they're culled rather than being generated, only to be culled later by rasterisation/early-Z.

NVidia might even be able to propogate the Z query back into DS so that the vertices that make up doomed triangles don't waste time computing attributes (i.e. early-out from DS) and mark the vertices in some way that allows them to be deleted (e.g. they get culled instead of being put in post-DS cache).

Or at the very least generate an always-on GS (in addition to anything the developer codes) that culls triangles by quering the early-Z system.

Jawed

All primitives need to update Z, but since triangle setup (or patch setup in dx11) happens before tessellation stages run, so the setup kernel simply doesn't have the Z this patch will generate after HS and DS run.

As for the setup-of-tessellated-tris, there is still a stage of DS which will run before Z can be determined. Not to mention that the Z will be interpolated by the rasterizer before the final Z is available to perform the Z-query.

Are there any occlusion instructions available in the HS/DS so that an entire patch can be killed in case it can be determined that this patch will not be rendered, (on the basis of back face culling etc. )? Waiting till GS to cull what can be culled in HS seems rather wasteful. Tessellation means that invisible pixels now waste HS/TS/DS time as well, besides wasting PS time.

Inclusion of culling instructions/intrinsics in HS/DS could be a win in DX11.1.


Hopefully NVidia's done it right and made setup a kernel, like VS or DS. That would mean it's arbitrarily fast, only limited by internal bandwidths/ALUs.

Better, if the setup algorithm queries the early-Z system and early-outs wodges of triangles (e.g. in batches of 32).

Makes me wonder if L1 cache is used to communicate Z from ROPs to ALUs.

Jawed

Makes me wonder if L1 cache is used to communicate Z from ROPs to ALUs.

Jawed
That would make fermi quite tile-ish. :smile:
 
I too doubt it but there's a certain limit to FPS when you're at 1280 without AA and that's the CPU limit. If the bench was run on below say 4 GHz, it could happen. btw 22 to 15 is not 20%. It's 35%.
Would it be the case, a 4.5GHz i7 900 wouldn't allow 30/40k+ P graphics scores with 2/3 5870.

P22k to X15k indeed IS 20% framerate loss in the GPU tests, as the CPU score doesn't change and in this case should stay somewhere between 80k and 100k, but has less influence on the X score (weight of 0.05 instead of 0.25).
 
Look at RV670, with its half setup rate...
:?: Setup in RV670 is 1 triangle per clock isn't it?

Are you talking about setup rate with tessellation turned on? The Xenos tessellator is half-rate with tessellation on and I guess R6xx GPUs are the same. I can't remember what Evergreen does.

Jawed
 
Tesselation might be nice, but how important will it be during the lifetime of Fermi? Developers can already implement it using Radeon cards and until the software is released AMD R900 cards will be on the shelf.

Tesselation is a checkbox feature with no importance to the average customer just like PhysiX and CUDA.
 
Tesselation might be nice, but how important will it be during the lifetime of Fermi? Developers can already implement it using Radeon cards and until the software is released AMD R900 cards will be on the shelf.

Tesselation is a checkbox feature with no importance to the average customer just like PhysiX and CUDA.
If games support it, it will become important. You have to start somewhere.
 
Waiting till GS to cull what can be culled in HS seems rather wasteful. Tessellation means that invisible pixels now waste HS/TS/DS time as well, besides wasting PS time.
It's not really wasteful, just very inelegant ... because one thing you can do in the HS is simply set the tesselation factor very low (and even without specific hardware support or a geometry shader you can just put it behind the clip plane to cull the output before it gets to the PS).
 
The rumoured GF100 numbers are P22000 and X15000. HD 5870 pulls around P17000 and X8500. So in spite of PhysX the advantage based on these numbers would be much higher in Extreme even with the much greater weight given to the graphics score, i.e exactly the opposite of what you said should happen.

Source? ;)

Anyway, without testbed details those numbers are meaningless. With a decently clocked i7 920 a 1GHz HD 5870 scores 21300 on Performance, not 17k. ;)

http://www.hwbot.org/community/subm...tage___performance_radeon_hd_5870_21301_marks
 
If games support it, it will become important. You have to start somewhere.

Yup and hopefully many will but they will all target the radeon line of parts and not the geforce ones as ati has a large lead that will continue to grow in the dx 11 market. But more dx 11 in games is a good thing.
 
They were discussing stock clock running cards, not OC'd or Heavily OC'd cards.

Actually we don't know the testbed details. Who knows what cpu has been used?

The comparison in an article was between GTX380 and 1GHz HD5870. ;)

Thus 22000P without knowing the type and the clock of the cpu means little.
 
Actually we don't know the testbed details. Who knows what cpu has been used?

The comparison in an article was between GTX380 and 1GHz HD5870. ;)

Thus 22000P without knowing the type and the clock of the cpu means little.

But to say if you OC one card you can score X, why point does it make? If you OC one you OC both and right now, reported scores for fermi are based on stock clocks, not OCs.
 
Any leakage on expected pricing? The details on this new architecture are starting to sound pretty exciting, and I tend to go after the fastest single-GPU board currently available. So at this point my main concern is whether or not I can afford it.
 
But to say if you OC one card you can score X, why point does it make? If you OC one you OC both and right now, reported scores for fermi are based on stock clocks, not OCs.

Yes but by the time Fermi is released, there may well be an HD 5890 (or whatever it may be called) which would essentially be an HD5870 at 1GHz. So the comparison makes sense...
 
Bjorn3D.com said:
Design Article releases tomorrow 7PM CST with complete Whitepaper info.

New Features, new cache, new Memory setup, and yes it's about 100% performance increase over GTX-2xx so figure single GTX-285 vs 5870 then double the GTX-285 performance.

Then it handles triangles different, triangles on any given frame can number in the hundreds of thousands so that's very important.

It will fold a lot better.

Increased efficiency in several areas.

It's a revolutionary new design oriented toward tessellation (those pesky triangles) and geometric programming. Problem being every wire frame is made up of triangles, tessellation takes those triangle and breaks them down into many smaller triangles. This core is uniquely designed to handle that so geometric and shader heavy games you will see more than the 100% raw power increase.

520USD might handle it. At 2x GTX-285 performance that puts it above GTX-295 performance and it's DX11 ready and designed for that specifically. Current ATI offerings are really good but basically a double the hardware on the same core design to provide more raw power. GF100 is a core design to take advantage of what the industry needs today and for some time in the future.

Read the article tomorro cause that's about all I can say tonight.
http://www.bjorn3d.com/forum/showpost.php?p=215717&postcount=8
 
Last edited by a moderator:
Back
Top