GPU vs CPU Architecture Evolution

silent_guy · Sep 12, 2009

Nick said:
I assume they modified CryEngine 2 to get that working.

Maybe...

But there's still all the other points.

BTW, the fact that Z related optimization buffer can be stream in an out of memory, doesn't mean that the internal buffer isn't a fixed sized resource. It's probably too implemented as a cache, with all the trashing problems that go with it.

Humus · Sep 12, 2009

silent_guy said:
Interesting. How many bits do you need per pixel? 1 bit per 4 pixels for Z compression? That would work out to be 60KB for a 1600x1200 render target. Not the end of the world in terms of additional bandwidth.

Not sure exactly how many bits are required, but it's not many. It's on a per-tile basis anyway and the stored depth value is a conservative low-res, and then you need a bit for whether the tile is compressed and there might be a bit or two for other stuff too. It's going to be neglectible on bandwidth in any case.

silent_guy · Sep 12, 2009

Just to be 100% clear: my whole point is around the fine grained context switching thing. If those guys are running Crysis at such low resolution, quality settings and frame rate that the GPU is completely idle 90% of the time then, yes, of course it's possible to run multiple instances. At that point, the whole thing becomes a software question and I basically lose interest.

But the claim that you can increase the overall performance of an already busy GPU by a factor of 2 or 3, let alone 10, by fast context switching is, IMHO, "absurd".

Nick · Sep 13, 2009

silent_guy said:
But the claim that you can increase the overall performance of an already busy GPU by a factor of 2 or 3, let alone 10, by fast context switching is, IMHO, "absurd".

I've heard stories of developers who wrote stress tests that made their GPU run far hotter that during a typical game. Personally I also experience that some games make my graphics card's fan rev up audibly while others keep it running below ambient noise...

I know, this isn't really exact science, but it doesn't sound all that absurd to me that today's complex games don't fully utilize the GPU. Shaders can have very different characteristics and there are many dependencies (both logical and physical) that prevent it from always keeping every resource busy.

Anyway, it looks like OTOY is working on a site and so they're probably preparing to announce some product and maybe will reveal some of the technology...

silent_guy · Sep 13, 2009

Nick said:
I know, this isn't really exact science, but it doesn't sound all that absurd to me that today's complex games don't fully utilize the GPU.

I'm not refuting any of that. If you have a shader that does nothing but calculations while hardly using textures, yes, TEX and ROP will be underused.

But it is a whole other thing to say that current hardware has the means to solve this by running multiple instances at the same time.

liolio · Sep 21, 2009

Does this article have been already linked?

hoom · Sep 21, 2009

From above linked article, RV770 = Holy crap on both axes

Interesting that G200 comes in the middle of the others on both axes, I'd have expected it to be an outlier in at least one.

Alexko · Sep 21, 2009

Well, as David Kanter said himself, referring to the DP implementation in GT200: "I tried not to be too critical in that article, but it's pretty much crap."

Obviously, on an SP chart, the results would be quite different.

rpg.314 · Sep 21, 2009

Yup, rv770 will beat the crap out of everything out there even more

rpg.314 · Sep 21, 2009

hoom said:
From above linked article, RV770 = Holy crap on both axes

Interesting that G200 comes in the middle of the others on both axes, I'd have expected it to be an outlier in at least one.

I'd like to see where does rv870 lies, in sp.

And yeah, expect to see charts like these from now on in many of amd's pr material to bust the cuda balloon.

pcchen · Sep 21, 2009

rpg.314 said:
I'd like to see where does rv870 lies, in sp.

And yeah, expect to see charts like these from now on in many of amd's pr material to bust the cuda balloon.

Actually this chart is a bit misleading because not all DP are equal though. For example, DP in most CPU are IEEE 754 compliant with all fancy things including exceptions, signal NaN, denormalized numbers, etc. which cost quite a bit to implement well. On the other hand, GPU and CELL have various level of support for 754 compliance. So it's no wonder they can have better DP efficiency.

Jawed · Sep 21, 2009

Hack and slash job guesstimate:

Jawed

3dilettante · Sep 21, 2009

Larrabee, if it were released in a form similar to the die shots a while back, would interestingly be in the same general vicinity as RV870 for DP (assuming 2 GHz, it's 2 TF SP and 1 TF DP), since it was estimated as being over 600 mm2.

The per watt numbers would be interesting to see.

GPU vs CPU Architecture Evolution

silent_guy

Humus

Crazy coder

silent_guy

Nick

silent_guy

liolio

Aquoiboniste

hoom

Alexko

rpg.314

rpg.314

pcchen

Moderator

Jawed

3dilettante

Similar threads