Beyond3D's GT200 GPU and Architecture Analysis

mczak · Jun 17, 2008

Love_In_Rio said:
It took Nvidia too many space to add double precision hardware. It is a shame it´s not usable for gaming. Are Ati´s double precision units usable for gaming when not used for gpgpu ?

RV770 basically gets DP for free since it uses the same units as for SP - the four "simple" fp units are basically wired up to be one DP unit. This should be way cheaper (in terms of transistor count) than nvidias solution, not to mention faster. But apparently this wasn't easily possible with nvidias G80 and newer design (otherwise I'm sure nvidia would have chosen such a solution). Maybe the 5-way MIMD design of the R600 shader alus wasn't only done this way to save control logic (and sacrifice a bit efficiency) but actually was a clever move to be able to offer DP cheaply later...

Mintmaster · Jun 17, 2008

mczak said:
Well, one part of this certainly are the DP units. Maybe they aren't that big (given their low number and probably little overhead for managing them since apparently they share all the register/issue etc. resources), but for games these transistors are obviously a complete waste doing absolutely nothing than consume (hopefully only a little bit due to clock gating) power.

No argument here. I was just replying to Arun's rather silly interpretation of Still's comment.

The register file increase might also be targetted more towards GPGPU - claiming a ~10% increase increase in 3dmarks due to this is nice, but I'm not sure the increase in transistors/die size/power usage would be worth it (well I've no idea of the increase in these areas in absolute figures really but is it below 10%?)

It probably was, as 32KB extra for 30 processors is 8 Mbit. I can't imagine this cost more than 100M transistors, and it should be much less. Maybe only 4% of the total area.

Psycho · Jun 17, 2008

and if I could be bothered to reboot to Windows and fire up Photoshop, I could make a version of the old 70Gpixel/sec Z-only image, only this time it'd be around the 1/10th of a terazixel mark.

Maybe you should try.. Seems like something is not in line with the expected behavior (unleess something is wrong with the testing methodology here):

http://www.tomshardware.com/reviews/nvidia-gtx-280,1953-9.html

Jawed · Jun 17, 2008

EDIT: [hmm this bit now redundant] G80's 16 multiprocessors each have register file capacity for 8192 scalars, 32KB. So each GT200 register file is 64KB, since they've doubled in size.

So in total that's 0.5MB in G80 versus almost 2MB in GT200.

But I agree, a small part of the GPU.

---

Funny, going back to our ages-old discussions over register file size, NVidia increased it, and then some.

http://forum.beyond3d.com/showthread.php?p=1033461#post1033461

---

Annoyingly enough we still don't know if the per-clock bandwidth has increased in the multiprocessors. I'm presuming that reads are 32-wide instead of 16 wide like they are in G80. Perhaps new CUDA documentation will fill us in.

Jawed

3dcgi · Jun 17, 2008

Rys said:
Heh, yeah

I see Arun threw in his rant about shader core triangle setup too, without really asking

I'm not sure why people think one prim per clock is a significant bottleneck. I'm sure it is at times when a lot of backface culling is occuring, but most games are pixel limited. Especially at the resolutions in these reviews!

Besides there are other aspects of a GPU that need to be beefed up to make increasing this rate worthwhile. The input assembler is once such place.

Love_In_Rio said:
It took Nvidia too many space to add double precision hardware. It is a shame it´s not usable for gaming. Are Ati´s double precision units usable for gaming when not used for gpgpu ?
I would say that Larrabee has a lot to do with the desing of this new chip, and that has implied Nvidia to forget looking in the rear mirror to ATI.

Why do you think Larrabee had a lot to do with this chip's design? It seems like a fairly logical evolution from G80.

Mintmaster · Jun 17, 2008

3dcgi said:
I'm not sure why people think one prim per clock is a significant bottleneck. I'm sure it is at times when a lot of backface culling is occuring, but most games are pixel limited. Especially at the resolutions in these reviews!

Not entirely. Look at resolution scaling. If the inverse of frame rate doesn't scale as fast as pixel count, it usually means vertex rate matters, and with current GPUs, vertex shaders are run extremely quickly so setup rate is probably the limiting factor. There are a few exceptions, however, as environment and shadow maps are usually rendered at the same size regardless of display resolution, so it may not be vertex rate.

Still, it's safe to say a big part of shadow map rendering is setup cost. Z-only fillrate is really damn fast.

mozmo · Jun 17, 2008

$650 video cards do nothing to improve pc gaming a whole, the numbers aren't by any means amazing, decent but for the money decent isn't what I'd expect.

I don't think the brute force approach worked well for nvidia this time around.

Mintmaster · Jun 17, 2008

Jawed said:
Funny, going back to our ages-old discussions over register file size, NVidia increased it, and then some.

http://forum.beyond3d.com/showthread.php?p=1033461#post1033461

Sure, but I was talking about the situation where SIMD width was doubled per SM. Instead we had the bilinear rate doubled per cluster from G80 to GT200, so doubling the register file makes a lot of sense if you want to take full advantage of that.

Imagine a texture limited shader with 20 FP32's per thread. G80 can absorb up to 205 core clocks of latency at full speed, i.e. saturating texture throughput. G92 can absorb up to 102 clocks at full speed (which, of course, is twice the speed of G80), which is pretty low. GT200 can absorb 307 clocks at full speed (25% faster than G92). The extra SM gives you more threads per TU, so it's even better than G80.

There's a good chance that Vantage has more register pressure than your typical game from G80's launch, so it's a decision that made sense. It probably made sense with G92, too, but improving texture speed when register usage was high probably wasn't a priority.

dizietsma · Jun 17, 2008

As WaltC is not around on this thread currently ... ahem...

I knew eventually nv30 would come around and bite nvidia on the ass. After nvidia's nv30's mistake with aggressive process reduction, fast memory and narrow width (unlike r300) nvidia went all conservative, which worked well for them up to ... the GT200.

Now it is too conservative and ended up expensive as a result. It should be on a smaller process, less width and faster memory, like the ATi . Complete role reversal here from ages past !

In the UK the 9800 X2 is $650 and the GTX280 is up to $1000. And it's slower. It's not dogmeat, but it's not steak either and it is priced like caviar. Not a combination that entices me to upgrade to from a G80. If I had a 6 or 7 series nvidea I would not be either, I'd be waiting for RV7 or the GT200 die shrink.

Also, I'm no fan of multiple gpu's but neither am I a fan of processor covers that you can use the google earth ruler on either.

Pass

jimmyjames123 · Jun 17, 2008

Realistically, in terms of real world performance, GTX 280 is certainly not slower than 9800 GX2, it's actually the other way around

Take a look at [H]OCP's review where they show the huge fluctuations in the framerate of the 9800 GX2 card (and SLI and Crossfire systems in general). The amazing thing is how consistent and stable are the framerates on the GTX 280 and GTX 260 throughout the game.

Most reviews which claimed the 9800 GX2 is faster than the new GTX 280/260 cards only looked at average framerates, not at minimum framerates and not at lack of fluctuation in framerate during gameplay. Also, very few reviews looked at 8xAA or 16xAA CSAA.

Even the guys at Anandtech got it wrong. But [H]OCP, NV News, Rage3d (Chris Ray's review), PC Perspective, and some other places got it right. The GTX 280 and GTX 260 are definitely a nice improvement over the 9800 GX2 and the rest of the bunch in terms of real world performance/power consumption.

NVIDIA marketing did not do a good job this time. They missed out on two key opportunities:

1) They should have instructed reviewers to focus on smooth gameplay, consistency of framerates, and image quality (through 8xAA or 16xAA CSAA) compared to the current cards on the market. If NVIDIA lets reviewers go unchecked and test for pure framerate with no regards to smooth gameplay and high levels of AA, they are going to get hammered in many benchmarks against other dual-GPU solutions (from both NV and AMD).

2) They should never have priced GTX 280 at $649 with GTX 260 at $399 to create such a huge gap in price/performance ratio between the two cards. 60% more money for 10-20% more performance doesn't make a whole lot of sense. This made the GTX 280 look like a really poor value in reviews. The more sensible thing would be $429 for GTX 260, and $569 for the GTX 280. This way the GTX 260 is close enough to HD 4870 to be a competitor in price/performance ratio, and the GTX 280 is close enough in price to the GTX 260 to be a viable higher performance alternative without breaking the bank and without being a very poor value in comparison. And to generate the highest profit margins, it is always beneficial to push the most expensive and highest margin products. So discouraging people to buy the GTX 280 by creating such a big pricing differential between the 260 and 280 only hurts profit margins.

iwod · Jun 17, 2008

Looking at all review i think they should have gone strict to 55nm with GTX200.
Looks like drivers are totally not optimized for GTX200 yet. As well as games.
I think patch and drivers would greatly help GTX200 reaching its potential.

Why the fuss with Shader and DP? Cant they remake a Telsa GPU with DP while leaving the Gamers DP transistor where it is more useful like pure Fillrate. May be they are preparing themself for Larrabee in 2009.

The whole GTX200 feels to me it is more on GPGPU design then gamer GPU design.

Slyne · Jun 17, 2008

trinibwoy said:
Nah, it's nothing like that. The G80 GTS is a good 40-50% slower than the GTX so there's a real sacrifice being made. Given equal performance I would prefer a fully enabled chip over a "yield enhancing" SKU though. Of course, it's all pretty superficial but I think you're allowed to indulge in those frivoloties as a consumer.

As the former owner of an X800Pro, I totally agree. Pretty soon, I hit games where my card was just a tad insufficient and I would have slowdowns while the XT was perfectly fine. I've countless times regretted not waiting a month more before making my purchase.

As for the main subject here. Nvidia processors always seem to me to be about brute force more than elegance. I may be biased, but the way ATI uses memory for Geometry Shader output vs Nvidia makes a 6x bigger cache; or the way their SP units can be fused to make a DP one while Nvidia goes through the trouble of creating an extra 30 units that deprive the gamers of a TeraSomething engine while not providing enough for the HPC market anyway; or the way ATI looks to be constantly refining their ringbus, apparently eventually aiming for scaling cores using it (my guess), I'm always more attracted to their architectures than those of the green team. More than a starlet's butt, the GT200 evocates the Yamato to me. Hopefully, it'll have a better run.

MDolenc · Jun 17, 2008

Psycho said:
Maybe you should try.. Seems like something is not in line with the expected behavior (unleess something is wrong with the testing methodology here):

http://www.tomshardware.com/reviews/nvidia-gtx-280,1953-9.html

As the tool behind this test is probably mine, I think it's should be said that it was designed in GeForce FX era. I really don't think the timing routines and passes used are enough to accurately measure fillrates that are THAT high.

Thorburn · Jun 17, 2008

iwod said:
The whole GTX200 feels to me it is more on GPGPU design then gamer GPU design.

Wouldn't that make it the Phenom of graphics cards, sacraficing potential desktop perfomance in an effort to win over the server market.

Suppose that would make the old 5800 the Prescott too.

AnarchX · Jun 17, 2008

Since this numbers come already from NV, could we expect sometimes a higher advantage from the now up to 93% usable MUL? :???:

ChrisRay · Jun 17, 2008

Its really hard to quantify just how much the mul is going to help since its going to wildly depend on the application and shader being ran.

NocturnDragon · Jun 17, 2008

AnarchX said:
Since this numbers come already from NV, could we expect sometimes a higher advantage from the now up to 93% usable MUL?

Nice graph nVidia, as always! make 5% look like 25% and 15% like 75%!

tacopaco · Jun 17, 2008

Nice graph nVidia, as always! make 5% look like 25% and 15% like 75%!

to be honest if you hadn't commented on that i would have thought they were ahead by 25/75% and never read the graph

iwod · Jun 17, 2008

Thorburn said:
Wouldn't that make it the Phenom of graphics cards, sacraficing potential desktop perfomance in an effort to win over the server market.

Suppose that would make the old 5800 the Prescott too.

Well yes, the difference is Nvidia doesn't have Intel alike competitors. While AMD misstep gave Intel all the perfect period to recapture things they have lost.

Tchock · Jun 17, 2008

iwod said:
Well yes, the difference is Nvidia doesn't have Intel alike competitors. While AMD misstep gave Intel all the perfect period to recapture things they have lost.

I dunno, by all means disagree with me, but the RV770 is seemingly poised to become the "Conroe" of this generation.

Beyond3D's GT200 GPU and Architecture Analysis

mczak

Mintmaster

Psycho

Jawed

3dcgi

Mintmaster

mozmo

Mintmaster

dizietsma

jimmyjames123

iwod

Slyne

MDolenc

Thorburn

Moderator

AnarchX

ChrisRay

<span style="color: rgb(124, 197, 0)">R.I.P. 1983-

NocturnDragon

tacopaco

iwod

Tchock