NVIDIA GF100 & Friends speculation

Lonbjerg · Mar 25, 2010

Creig said:
I don't know about that chart Annihilator posted. Looking through it, the GTX 480 beats the HD 5870 at every single resolution and AA/AF level except on two tests, and just barely at that:

Crysis Warhead 19x12 4x/16x - 5870 beats 480 by 0.4 fps
Left4Dead 25x16 8x/16x - 5870 beats 480 by 0.5 fps

Some game engines favor one architecture over another, so typically you'll see ATi cards faster on Game X and Nvidia cards faster on Game Y. Even if one card is overall 15-20% faster than the other, the slower card still ends up winning a handful of benchmarks on specific games. But not this time according to that spreadsheet. And that casts doubt on its validity to me.

You mean like this?
http://www.anandtech.com/showdoc.aspx?i=1683&p=6
(9700 Pro)

Or like this?
http://www.anandtech.com/video/showdoc.aspx?i=2870
(8800GTX)

Or like this?
http://www.anandtech.com/video/showdoc.aspx?i=2870
(2900XT)

Or like this:
http://www.anandtech.com/video/showdoc.aspx?i=3643&p=17
(5870)

It's not unheard off...especially not when dumping a new architechture on the market.

I'm not convinced it's not a fake benchmark thoguh...just saying the validity of dismissing benches due to one side having a large advantage in most games is a flawed stance.

neliz · Mar 25, 2010

Chalnoth said:
Now, we don't know what the performance is like in these other games, but there is always the possibility that one or more of them were omitted due to poor performance. That said, high performance in the remaining titles still would mean good overall performance, which is precisely what nVidia really needs right now.

The benchmarks leaked last weekend (DiRT2 and Far Cry 2) were 100% real, with Far Cry 2 being the "best case scenario"

rpg.314 · Mar 25, 2010

neliz said:
GF104 won't compete with Cypress, it's projected specs put it somewhere north of GT200 in the computational department and a bit down on bandwidth. Definitely something to beat up Juniper with.

My take on this,

8 SM's
~1.5G hot clock
256bit gddr5, base clock close to gt200's

Assuming it is kinda gt200 + dx11 perf wise, then @330mm2 it is going to have nice time competing against juniper on perf/(manufacturing cost).

I'm pretty sure AMD made reviewers quite well aware of all the buts and ifs for the GT300 reviews, let's see who picks up on that because I smell a new soap opera coming (kind of like AMD's "Paper Dragon" presentation.)

I have a feeling that shielding 480 from the evils of afr, embodied in that creation of satan called 5970, will be a part of epic battle between good and evil.

neliz · Mar 25, 2010

Picao84 said:
I changed my post.. With 320 Cuda Cores? :smile:

Not likely considering the architectural inefficiencies. if GF100 is hot and hungry compared to cypress, GF104 will be hot and hungry compared to Juniper.

rpg.314 said:
My take on this,

8 SM's

~1.5G hot clock

256bit gddr5, base clock close to gt200's

Sounds healthy to me

I have a feeling that shielding 480 from the evils of afr, embodied in that creation of satan called 5970, will be a part of epic battle between good and evil.

The Devil doesn't wear green.

KimB · Mar 25, 2010

neliz said:
The benchmarks leaked last weekend (DiRT2 and Far Cry 2) were 100% real, with Far Cry 2 being the "best case scenario"

Even if true, bear in mind that there are a wide variety of ways for benchmarks to be misleading. In this case, we could be talking about immature GF100 drivers, for instance.

KimB · Mar 25, 2010

neliz said:
Not likely considering the architectural inefficiencies. if GF100 is hot and hungry compared to cypress, GF104 will be hot and hungry compared to Juniper.

That's not necessarily the case, not if the GF100 turns out to significantly outperform, because then nVidia could significantly lower the clocks of their lower-end parts to get better perf/power ratios.

neliz · Mar 25, 2010

Chalnoth said:
Even if true, bear in mind that there are a wide variety of ways for benchmarks to be misleading. In this case, we could be talking about immature GF100 drivers, for instance.

The card being late doesn't mean the drivers had zero development in them.

Picao84 · Mar 25, 2010

neliz said:
Not likely considering the architectural inefficiencies.

Care to elaborate on this please? What architecture inneficiencies? That it requires more power may not 100% justify that affirmation.

KimB · Mar 25, 2010

neliz said:
The card being late doesn't mean the drivers had zero development in them.

Of course not. But drivers need quite a lot of work to be ready for actual use in games. In any case, we should find out either tomorrow or Monday what the real situation is at launch. It will, of course, take a few months to see how the driver situation shapes out, but for now the benchmarks available within the next few days will give us a good idea of how well the cards will perform when they first reach peoples' hands.

GZ007 · Mar 25, 2010

Florin said:
Probably because the increased latency and micro-stuttering make it inherently inferior, as does wasting half of the total installed memory, requiring driver profiles to get the full benefit in most applications. mGPU smells

Multiple dies on substrate acting as a single gpu could solve all of this in the future.
The nvidia results did contain SLI gtx480 numbers. And 3D Vision Surround will work only on SLI. So both companys count with mGPU.

fellix · Mar 25, 2010

Overclocked GTX470 run in 3DMark Vantage, incl. all feature tests.

rpg.314 · Mar 25, 2010

GZ007 said:
Multiple dies on substrate acting as a single gpu could solve all of this in the future.

And how do you solve the resulting heat problem?

GZ007 · Mar 25, 2010

rpg.314 said:
And how do you solve the resulting heat problem?

Maybe the monster gtx480 cooler would be enough to cool down 2 cypress dies at once

Xenos daughter die had some core logic and that was some years ago. So its not just in realm of fantasy.

mczak · Mar 25, 2010

Picao84 said:
I changed my post.. With 320 Cuda Cores? :smile:

Given the partitioning into gpcs, that would mean 2 GPCs with 5 SMs each. If the GPCs otherwise remain unchanged, would mean though can "only" rasterize 16 pixels/clock, which if we're assuming 32 rops (if that's a 256bit bus) seems a bit low. What about 288 cores instead? 3 GPC with 3 SMs each? Not sure though what you could expect from such a card, with bandwidth like Cypress yet alu/tmu throughput below Juniper (unless for tmu you take Arun's 3 half-quad tmus per SM). Actually the ratios (except tmu) would be close to GT200...

Jawed · Mar 25, 2010

trinibwoy said:
Because the performance gain going from a middling CPU to a very powerful one is often negligible. You can blame PCIe or system bandwidth but there have been enough tests showing that higher PCIe or system bandwidth is even less relevant to game benchmarks. So what are the other potential culprits if not the GPU itself?

Bandwidth and latency are not the same thing, so systematic PCI Express latencies are real - see any GPGPU effort for the work-arounds required to minimise that impact.

Additionally the API has some fairly fundamental, coarse, granularities in it. As I said to PeterT earlier, this is why D3D10 implements finer-grained update of state (e.g. constant buffers) and why D3D11 allows multi-threaded construction of state.

All of these things conspire against old graphics engines that are built on out of date techniques with the DX9 API. The efficiency gains in D3D11 are enough that it's worth running the game/drivers in D3D11 mode even though the hardware is only capable of DX9.

Well there are micro-benchmarks that target specific functions. The problem is that nobody really picks apart a game to see what it's doing within each frame. What we need is a combination of PIX and NVPerfHud (or AMD's equivalent).

Still don't even have a decent answer why tessellation in Heaven 1.0 is so slow on ATI. Somehow I think we'll be waiting a long time.

The game can be scaling poorly because it is bandwidth or setup bound. Does that mean the game is inherently not scalable or does it simply mean there wasn't an adequate increase in bandwidth or setup in proportion with other things?

Sure, that would be a hardware limitation. It might be a fairly noticeable bandwidth-efficiency limitation (see the 8xMSAA performance in GT200) or it might simply be not enough bandwidth. Or setup rate. Just have to prove that the specific game is sensitive in that respect.

Of course review sites that even bother to activate 8xMSAA or adaptive/transparency MSAA are pathetically few.

If you're going to make a reasoned comparison of the gains in a replacement GPU, you've gotta take the mix into account: unit counts + bandwidth + serialisations. I don't expect to see many existing games benefit from the dramatically higher setup rate in GF100 - but the architecture's more finely-grained rasterisation (which is a result of the parallel setup architecture) may mean that those same games see "better than expected" scaling on GF100. If that proves to be the case, then it's another parameter to investigate in Cypress scaling - though it seems unlikely there'll be much of that done, either.

That may not turn out to be due to finely-grained rasterisation. It might be to do with the way hardware threads are launched. etc.

At work I'm not allowed to blame the workload if my design isn't adequate. That applies here as well.

Why's that relevant? It's quite clear that a lot of game developers don't have the time/resources to implement a state of the art scalable and efficient engine. And if the API stands in the way...

The bottom line is you have to prove your test is evaluating what you say it is, before you can say that the test indicates X about the test subject.

I'm certainly not excluding the possibility of end-of-an-architecture problems in Cypress, where scaling has hit the end-stops due to something. I suspect some fixed-function stuff is out of its depth, but I say that because of the poor tessellation performance, not because of existing games.

It's notable that we're (generally, presumably some did know) only now realising that Crysis 1920x1200 with 4xMSAA is limited by the capacity of video RAM. How long have reviewers been testing that game at that setting? Why's it taken so long to discover the RAM limitation? What folly it has been to say it doesn't scale, when the card is running out of memory. (Though I think the game is still meant to scale substantially with CPU at that kind of setting, too - honestly Crysis has long seemed like a red herring as analysis has been woeful.)

Jawed

mczak · Mar 25, 2010

fellix said:
Overclocked GTX470 run in 3DMark Vantage, incl. all feature tests.

GPU cloth and particles (ft 4/5) are impressive, the rest - well much less so...
I didn't see the OC part though, what's the clock?

Vincent · Mar 25, 2010

In terms of prototype level, gf104 does not exist right now.

ShaidarHaran · Mar 25, 2010

neliz said:
The card being late doesn't mean the drivers had zero development in them.

"Zero development" was never mentioned, and is an absurd assumption. It's simple common sense that a new card with a brand new architecture stands to gain more performance from the evolution of drivers than one that has been out for 6 months and is itself a revision of an existing architecture.

neliz · Mar 25, 2010

ShaidarHaran said:
"Zero development" was never mentioned, and is an absurd assumption. It's simple common sense that a new card with a brand new architecture stands to gain more performance from the evolution of drivers than one that has been out for 6 months and is itself a revision of an existing architecture.

And that's my point. with the original GF100 being expected about the same time as Cypress, driver development surely hasn't lagged for all that time.

And with nV's driver team always seen as outperforming AMD's, who's to say that they weren't able to do just as much work over the past few months.
This is, after all a very expensive and very important product, you don't want to ruin that launch because the drivers are teh suxx, assuming they've been sleeping and had little time to work on drivers is imho a wrong PoV.

Pressure · Mar 25, 2010

GZ007 said:
Actualy if a certain die area makes you problems than u could put two smaller dies that have better yealds and end with higher performance. I think the multi gpu-s are a good idea with the parralel nature of the graphic.
I could imagine multi die gpu-s in the future. I mean 2 or more dies on a single GPU substrate. U would only need to create a botton line gpu and than just increase the number of dies. .

Yeah, that worked out great for 3Dfx. Just check out the Voodoo5 6000.

NVIDIA GF100 & Friends speculation

Lonbjerg

neliz

GIGABYTE Man

rpg.314

neliz

GIGABYTE Man

KimB

KimB

neliz

GIGABYTE Man

Picao84

KimB

GZ007

fellix

rpg.314

GZ007

mczak

Jawed

mczak

Vincent

ShaidarHaran

hardware monkey

neliz

GIGABYTE Man

Pressure

Similar threads